profile
viewpoint

davejrt/about 0

Sourcegraph blog, feature announcements, and website (about.sourcegraph.com)

davejrt/atlantis 0

Terraform Pull Request Automation

davejrt/cnab-workshop 0

CNAB / Duffle workshop for KubeCon Seattle 2018

davejrt/derek 0

derek - a serverless :robot: to manage PRs and issues

push eventsourcegraph/deploy-sourcegraph-dhall

Dave Try

commit sha c6cf632f9ebe0e7d49eac51e8abf775cd1dc4022

setup dhall diff ci job

view details

push time in 3 days

push eventsourcegraph/deploy-sourcegraph-dhall

Dave Try

commit sha 431e87444e3adeecc0725117d9058914eead2f37

setup dhall diff ci job

view details

push time in 3 days

push eventsourcegraph/deploy-sourcegraph-dhall

Dave Try

commit sha e2a7ebdfbd7a5232a0a322a306adb5a2d416e459

setup dhall diff ci job

view details

push time in 3 days

create barnchsourcegraph/deploy-sourcegraph-dhall

branch : dave/dhall-diff

created branch time in 3 days

issue openedsourcegraph/sourcegraph

deploy-sourcegraph-dhall: Add CI job to generate diff between master and feature branch

Create a CI job that runs on commits to diff between the master and feature branch.

This job should:

  • [ ] use dhall to generate the git action itself
  • [ ] be a shell script that invokes dhall to generate the diff

created time in 3 days

issue commentsourcegraph/sourcegraph

Bare-metal Buildkite agents capable of running Docker and VMs

Update: image is now usable in gcp under sourcegraph-dev/buildkite-agent. Still outstanding is to automate this or at least make the image producable in an automated fashion

slimsag

comment created time in 3 days

push eventsourcegraph/sourcegraph

davejrt

commit sha e7e4fa2ba50c7baddfab88458147bc15d7543773

update docs for more verbose watchman instructions (#12670)

view details

push time in 4 days

delete branch sourcegraph/sourcegraph

delete branch : dave/watchman

delete time in 4 days

PR merged sourcegraph/sourcegraph

Reviewers
update docs for more verbose watchman instructions

Just a small update to clarify the watchman instructions. The curl command was broken and the following commands make it a bit quicker and easier for those not as confident with those environment variables

+7 -1

0 comment

1 changed file

davejrt

pr closed time in 4 days

PR opened sourcegraph/sourcegraph

Reviewers
update docs for more verbose watchman instructions

Just a small update to clarify the watchman instructions. The curl command was broken and the following commands make it a bit quicker and easier for those not as confident with those environment variables

+7 -1

0 comment

1 changed file

pr created time in 4 days

create barnchsourcegraph/sourcegraph

branch : dave/watchman

created branch time in 4 days

push eventsourcegraph/sourcegraph

davejrt

commit sha 9d471d099db2b99d500957bb67ac5fa70c1ce623

don't enforce bash 5 on linux (#12592) * Change to bash 4. Update error for Darwin and Linux

view details

push time in 4 days

delete branch sourcegraph/sourcegraph

delete branch : dave/bash

delete time in 4 days

PR merged sourcegraph/sourcegraph

don't enforce bash 5 on linux

Makes things a bit trickier for those not running on mac

+18 -5

5 comments

1 changed file

davejrt

pr closed time in 4 days

pull request commentsourcegraph/sourcegraph

don't enforce bash 5 on linux

@keegancsmith bash 4 is fine. The "quick" work around for 5 on Ubuntu is to upgrade to 19.x and for anyone using wsl2 in place upgrades are a pain. For more context I was helping @kghopson set up her env on Friday and for someone who "just wants it to work" I think we should be more lenient.

davejrt

comment created time in 4 days

push eventsourcegraph/sourcegraph

Dave Try

commit sha b0c1b1806966d5f68ed67bf7e29572bae00a0d55

bash linting

view details

push time in 4 days

push eventsourcegraph/sourcegraph

Dave Try

commit sha b29b35a0f6284cac81daa49c80dc19c0a3430c2f

Change to bash 4. Update error for Darwin and Linux

view details

push time in 4 days

issue commentsourcegraph/sourcegraph

Distribution: 3.19 Tracking issue

THIS WEEK

I'd say a fairly even portion of my time was spent between working on tasks for $customer and replicating their setup internally. I haven't gotten back around to do anything with Dhall which is unfortunate but I hope to make more progress on that next week. I made a decent start on #12101 and a working agent that aside from needing some fine tuning will be good to start using and iterate on.

I did take a tertiary glance at removing alertmanager but Robert kindly informed me that this was wrapped up with a bunch of other PRs he'll land next week so it's in his more than capable hands for now.

NEXT WEEK

I'll have a PR ready to land for the baremetal CI agent which will be ready to start running jobs. I'm going to resync with geoffrey and/or uwe re dhall and make an effort to pick that back up again.

pecigonzalo

comment created time in 6 days

push eventsourcegraph/sourcegraph

Dave Try

commit sha bd532198cffbf2c9a90963dbd5cbd2f2fbe90a7d

don't enforce bash 5 on linux

view details

push time in 7 days

PR opened sourcegraph/sourcegraph

don't enforce bash 5 on linux
+1 -1

0 comment

1 changed file

pr created time in 7 days

create barnchsourcegraph/sourcegraph

branch : dave/bash

created branch time in 7 days

issue commentsourcegraph/sourcegraph

Bare-metal Buildkite agents capable of running Docker and VMs

Based on the requirements above, I'm looking at taking some inspiration from the buildkite elastic ci.

slimsag

comment created time in 9 days

issue openedsourcegraph/sourcegraph

migrate terraform state to GCP

Currently the following terraform deployments rely on local state, and developers running a terraform apply then checking in there code + a state file into the repo. In doing so we assume the risk that developers could corrupt a state file or forget to check it in.

The following terraform deployments should be migrated to use remote state in GCP:

  • https://github.com/sourcegraph/infrastructure/tree/master/cloud
  • https://github.com/sourcegraph/infrastructure/tree/master/dns
  • https://github.com/sourcegraph/infrastructure/tree/master/site24x7

TODO

Determine naming scheme for each deployment

created time in 11 days

issue commentsourcegraph/sourcegraph

Distribution: 3.19 Tracking issue

Week July 20

  • Met with Geoffrey regarding Dhall and took small steps to familiarize myself more with this
  • A lot of time spent on calls troubleshooting with $CUSTOMER indexed-search issues and assiting in the rollout of overlays related to https://github.com/sourcegraph/customer/issues/65
  • Migrated data over from https://bigdata.sgdev.org to https://megakube.sgdev.org to assist with troubleshooting and debugging issues
  • Assiting Indeed with rollout of new alertmanager https://sourcegraph.slack.com/archives/CSKMGUJ58/p1595433496171400
pecigonzalo

comment created time in 14 days

Pull request review commentsourcegraph/about

distribution: add monitoring architecture page

+# Sourcegraph monitoring architecture++**Note:** Looking for _how to monitor Sourcegraph?_ See the [observability documentation](https://docs.sourcegraph.com/admin/observability).++**Note:** Looking for _how to develop Sourcegraph monitoring?_ See the [monitoring developer guide](monitoring.md).++This document describes the architecture of Sourcegraph's monitoring stack, and the technical decisions we have made to date and why.++<!-- generated from monitoring_architecture.excalidraw -->+![architecture diagram](https://storage.googleapis.com/sourcegraph-assets/monitoring-architecture.png)++## Long-term vision++To better understand our goals with Sourcegraph's monitoring stack, please read [monitoring pillars: long-term vision](monitoring_pillars.md#long-term-vision).++## Monitoring generator++We use a custom [declarative Go generator syntax](https://sourcegraph.com/github.com/sourcegraph/sourcegraph/-/tree/monitoring) for:++- Defining the services we monitor.+- Describing _what those services do_ to site admins.+- Laying out dashboards in a uniform, consistent, and simple way.+- Generating the [Prometheus alerting rules](#alerting) and Grafana dashboards.+- Generating documentation in the form of ["possible solutions"](https://docs.sourcegraph.com/admin/observability/alert_solutions) for site admins to follow when alerts are firing.++This allows us to assert constraints and principles that we want to hold ourselves to, as described in [our monitoring pillars](monitoring_pillars.md).++To learn more about adding monitoring using the generator, see ["How easy is it to add monitoring?"](monitoring.md#how-easy-is-it-to-add-monitoring)++## Sourcegraph deployment++### Sourcegraph Grafana++We use [Grafana](https://grafana.com) for:++- Displaying generated dashboards for our Prometheus metrics and alerts.+- Providing an interface to query Prometheus metrics and Jaeger traces.++The [`sourcegraph/grafana`](https://github.com/sourcegraph/sourcegraph/tree/master/docker-images/grafana) image handles shipping Grafana and Sourcegraph monitoring dashboards. It bundles:++* Preconfigured [Grafana](https://grafana.com), which displays data from Prometheus and Jaeger+* Dashboards generated by the [monitoring generator](#monitoring-generator).++#### Admin reverse-proxy++For convenience, Grafana is served on `/-/debug/grafana` on all Sourcegraph deployments via a reverse-proxy restricted to admins.++Services served via reverse-proxy in this manner could be vulnerable to [cross-site request forgery (CSRF)](https://owasp.org/www-community/attacks/csrf), which is complicated to resolve ([#6075](https://github.com/sourcegraph/sourcegraph/issues/6075)). In addition, the [monitoring generator](#monitoring-generator) creates provisioned dashboards that, once loaded, cannot be edited in Grafana at all.++This means that at the moment, making changes to Grafana using the Grafana UI is not possible without setting up a port-forward or creating custom dashboards, something [we want to avoid asking customers to do](monitoring_pillars.md#long-term-vision). ++### Sourcegraph Prometheus++We use [Prometheus](https://prometheus.io) for:++- Collecting high-level, and low-cardinality, metrics from our services.+- Defining Sourcegraph alerts as both:+  - Prometheus recording rules, [`alert_count`](#alert-count-metrics).+  - Prometheus alert rules (which trigger [notifications](#alert-notifications)) based on `alert_count` metrics.++The [`sourcegraph/prometheus`](https://github.com/sourcegraph/sourcegraph/tree/master/docker-images/prometheus) image handles shipping Sourcegraph metrics and alerting. It bundles:++* Preconfigured [Prometheus](https://prometheus.io), which consumes metrics from Sourcegraph services.+* Alert and recording rules generated by the [monitoring generator](#monitoring-generator).+* [Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/), which handles alerts from Prometheus.+* [prom-wrapper](https://github.com/sourcegraph/sourcegraph/tree/master/docker-images/prometheus/cmd/prom-wrapper), which subscribes to updates in [site configuration](https://docs.sourcegraph.com/admin/config/site_config) and propagates relevant settings to Alertmanager configuration.++#### Alert count metrics++`alert_count` metrics are special Prometheus recording rules that evaluate a single upper or lower bound, as defined in an [Observable](https://sourcegraph.com/search?q=repo:%5Egithub%5C.com/sourcegraph/sourcegraph%24+file:generator.go+type+Observable+struct+%7B:%5Bdef%5D%7D&patternType=structural) and generated by the [monitoring generator](#monitoring-generator). This metric is always either 0 if the threshold is not exceeded (or data does not exist, if configured), or 1 if the threshold is exceeded. This allows historical alert data to easily be [consumed programmatically](https://docs.sourcegraph.com/admin/observability/alerting_custom_consumption).++Learn more about the `alert_count` metrics in the [metrics guide](https://docs.sourcegraph.com/admin/observability/metrics_guide#alert-count).++*Rationale for `alert_count`*: TODO(@slimsag)++#### Alert notifications++We use [Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/) for:++- Providing data about currently active Sourcegraph alerts.+- Routing alerts to appropriate receivers and silencing them when desired, [configured using site configuration](#alert-notifications).++Alertmanager is bundled in `sourcegraph/prometheus`, and notifications are configured for Sourcegraph alerts [using site configuration](https://docs.sourcegraph.com/admin/observability/alerting). This functionality is provided by the [prom-wrapper](#prom-wrapper).++*Rationale for notifiers in site configuration*: Due to the limitations of [admin reverse-proxies](#admin-reverse-proxy), alerts cannot be configured without port-forwarding or custom ConfigMaps, something we [want to avoid](monitoring_pillars.md#long-term-vision).++*Rationale for Alertmanager*: An approach for notifiers using Grafana was considered, but had some issues outlined in [#11832](https://github.com/sourcegraph/sourcegraph/pull/11832), so Alertmanager was selected as our notification provider.++*Rationale for silencing in site configuration*: Similar to the [Grafana admin reverse-proxy](#admin-reverse-proxy), silencing using the Alertmanager UI would require port-forwarding, something we [want to avoid](monitoring_pillars.md#long-term-vision).++#### prom-wrapper++The [prom-wrapper](https://github.com/sourcegraph/sourcegraph/tree/master/docker-images/prometheus/cmd/prom-wrapper) is the entrypoint program for `sourcegraph/prometheus` and it:++* Handles starting up Prometheus and Alertmanager+* Applies updates to site configuration by [generating a diff and applying changes](https://sourcegraph.com/search?q=repo:%5Egithub.com/sourcegraph/sourcegraph%24+file:docker-images/prometheus+type:symbol+Change+OR+Diff&patternType=literal)+  * Most notably, this includes [configuring notifiers and silences](#alert-notifications) for Sourcegraph alerts+* Exposes [endpoints for configuration issues, alerts summary statuses, and reverse-proxies Prometheus and Alertmanager](https://sourcegraph.com/search?q=repo:%5Egithub.com/sourcegraph/sourcegraph%24+file:docker-images/prometheus+PathPrefix%28:%5Bpath%5D%29.Handler%28:%5Bhandler%5D%29&patternType=structural)++*Rationale for an all-in-one Prometheus image with prom-wrapper*: This allows us to avoid adding a new separate service (which must be handled in all our deployment types), thus simplifying the deployment story for our monitoring stack, while also improving the alert debugging workflow (for example, simply port-forward `svc/prometheus` to get access to the entire alerting stack), with minimal disadvantages (for example, high-availability Prometheus and Alertmanager can still be configured via `PROMETHEUS_ADDITIONAL_FLAGS` and `ALERTMANAGER_ADDITIONAL_FLAGS`, and Alertmanager can be disabled via `DISABLE_ALERTMANAGER`).++## Custom additions++TODO: how we handle out-of-band metrics, alerts (things we don't ship to customers)++## Sourcegraph Cloud++This section describes how our monitoring stack is used in Sourcegraph Cloud and what customizations we have made.++### Alerts++Notifiers for alerts are configured via the [`deploy-sourcegraph-dot-com` frontend ConfigMap for `site.json`](https://github.com/sourcegraph/deploy-sourcegraph-dot-com/blob/release/base/frontend/sourcegraph-frontend.ConfigMap.yaml#L5188-L5274).++Alerts go to the [`#alerts`](https://sourcegraph.slack.com/archives/CSCFMFXS5)channel, with critical alerts going to [OpsGenie](https://about.sourcegraph.com/handbook/engineering/on_call).++### Blackbox exporter++TODO

@bobheadxi Let me know if that description is sufficient for this doc

slimsag

comment created time in 17 days

push eventsourcegraph/about

Dave Try

commit sha 1951df887ed408c90d75c1c18f7f89c4ab06826d

add description for blackbox exporter

view details

push time in 17 days

issue commentsourcegraph/sourcegraph

Distribution: 3.19 Tracking issue

Week July 13

Landing blackbox exporter into sourcegraph.com environment sourcegraph/deploy-sourcegraph-dot-com#2984 Lots of time spend on calls with capital one working through issues with their deployment related to indexed search starting. Also implemeted the fix suggested by @pecigonzalo regarding pv/pvc and their respecting claimref from namespaces.

Week July 20 Continuing to work through CapitalOne issues and beginning on #12101 in addition to fine tuning some of the blackbox alerts to ensure they reflect exactly what we need. These seemed to function well though in light of lasts weeks cloudflare outage. Continuing to work with CapitalOne to resolve any further issues

pecigonzalo

comment created time in 18 days

issue closedsourcegraph/sourcegraph

Improve reliability of Sourcegraph.com site24x7 ping alerts

We use site24x7 on Sourcegraph.com to ensure it doesn't go down unnoticed (e.g. if both prometheus and grafana get taken down, we want something external to confirm the site is reachable).

This reports directly to OpsGenie (doesn't go through Prometheus or Grafana) and the configuration lives here.

It's OK/ideal that it bypasses Prometheus and Grafana, but site24x7 is a regular source of flaky alerts for us and we'd like to find something less flaky (it seems we're "testing the worldwide internet connection" currently).

This is the first major project I'd like for you to take on @davejrt !

closed time in 21 days

slimsag

issue commentsourcegraph/sourcegraph

Improve reliability of Sourcegraph.com site24x7 ping alerts

Closed by sourcegraph/deploy-sourcegraph-dot-com#2984

slimsag

comment created time in 21 days

issue commentsourcegraph/sourcegraph

Improve reliability of Sourcegraph.com site24x7 ping alerts

Blackbox exporter is now running on sourcegraph.com and can be queried via grafana using the probe_http_status_code to get a picture of when our services/sites are returning error codes other than a 200.

slimsag

comment created time in 21 days

push eventsourcegraph/sourcegraph

davejrt

commit sha ea795bbd795a4d17cc1bcecbe3ee4106cbad3815

remove relaod flag from dev script (#12277)

view details

push time in 22 days

delete branch sourcegraph/sourcegraph

delete branch : dave/prometheus_flag

delete time in 22 days

PR merged sourcegraph/sourcegraph

remove reload flag from dev script

This flag is now build into the container image via #12211

+0 -1

1 comment

1 changed file

davejrt

pr closed time in 22 days

PR opened sourcegraph/sourcegraph

remove reload flag from dev script

This flag is now build into the container image via #12211

+0 -1

0 comment

1 changed file

pr created time in 22 days

create barnchsourcegraph/sourcegraph

branch : dave/prometheus_flag

created branch time in 22 days

push eventsourcegraph/sourcegraph

davejrt

commit sha 8c7d69955f669fbc0dead8dd1aa407a48d2a6460

enable reload endpoint for prometheus (#12211)

view details

push time in 22 days

delete branch sourcegraph/sourcegraph

delete branch : dave/prom_reload

delete time in 22 days

push eventsourcegraph/sourcegraph

Christina Forney

commit sha 750c90ffcf65fe1bf80555ccefe8987f1c94f8b5

Fixes #12057 search example page copy edits (#12084)

view details

Felix Becker

commit sha 58124b94ad83ab3121eca9450f5fedda537bec3a

Serve mock index.html and window.context (#12091)

view details

Felix Becker

commit sha 7889886f2cd906deccc14f88f5afa18fb3c28b39

Add integration tests to CI (#12096)

view details

Simon

commit sha 4623f8c9a52da3e4d393666f6f725a9bf224f1fa

typesafe stubbing of graphQL requests for integration tests (#11983) * WIP first iteration of type generation of all queries * WIP: fixed nested list in the parser + gave operations unique names * WIP: partially migrated search tests to have typesafe responses, operations in shared is the last standing issue * refactored tests to use typechecked graphql stubbing * fixed eslint issues * runt prettier on generated sources * changed override to be a function + PR comments Co-authored-by: Marek <marekz@gmail.com> * rebased * make process fail if we have an error during extraction * added site admin activation status

view details

ᴜɴᴋɴᴡᴏɴ

commit sha 985c423d2de6eac234f2aabd6ad1b5ecfb2dbea3

gqltest: add search integration tests (#12085)

view details

Erik Seliger

commit sha f02940d197e58d0aaefdfcea7098017bb113adbd

Don't use goacc on master branch to fix coverage reporting (#12112)

view details

renovate[bot]

commit sha bccd65c0c9764f89ab5b1a871c4c5e01cba008ac

Update dependency graphql to ^14.7.0 (#11975) Co-authored-by: Renovate Bot <bot@renovateapp.com>

view details

renovate[bot]

commit sha 8f01e217cdd08a5a2fc327bce53ff0c86579ba65

Update dependency json-schema-to-typescript to v9 (#12115) Co-authored-by: Renovate Bot <bot@renovateapp.com>

view details

Erik Seliger

commit sha f7abecdfb7d9d0533f1c5e2903fb291f0ce57d94

Revert original go-acc PR (#12122) * Revert "Don't use goacc on master branch to fix coverage reporting (#12112)" This reverts commit f02940d197e58d0aaefdfcea7098017bb113adbd. * Revert "ci: enable go-acc on default branch (#11887)" This reverts commit 49d50f4236e7589e7c1de9eb4a3086284d00229e.

view details

Eric Fritz

commit sha 5ba8bfa1739f364cf59a17264edaf1d18bb9f520

Bump recommended src-cli version to 3.16.0. (#12059)

view details

Asdine El Hrychy

commit sha 314584824f9bcd0def4d4d903e5f8a2eaf26a402

Revert "Delete cloning tab from the site-admin repositories page (#12043)" (#12127) This reverts commit c9224061da6a614e0e15d04c5aa057dccda45dd1.

view details

Eric Fritz

commit sha 55b594439ec50b704e934d158b111a2f47375e97

db: Add generic interoperable base store (#12056)

view details

Asdine El Hrychy

commit sha 52130ea0e92d3e653f09b5b212dae0d2ae114220

Revert "cloud: use the cloned column to filter by clone status (#11932)" (#12128) This reverts commit cab1a64cfdbe0c930cccdc9fd550e60913972b60.

view details

Felix Becker

commit sha 82f8b6d1d2e6425d2a0ae36f342dae092f139ddd

Integration test updates (#12121)

view details

Felix Becker

commit sha a1d87dc56f3a95fb8c5a1d5e895544c3b69b2d77

Use mocha directly for integration tests and move to shared/ (#12130)

view details

Keegan Carruthers-Smith

commit sha c6269100f1ca422ebc552d57800dce598aeafb2a

search: return branches to index to Zoekt (#12089) If we have version contexts enabled we will tell Zoekt to index the branches in the version context. There is a related Zoekt change to pass in the repository as well as understand the Branches field being set. Previously Zoekt was responsible for resolving HEAD. Now that we will do more than HEAD, it was easier to implement the resolving logic in Sourcegraph. This will both simplify the responsiblilities of zoekt-sourcegraph-indexserver as well as allow us to optimize patterns around resolving version context revisions. We additionally update Zoekt to tell us the repo as well as support indexing multiple branches.

view details

Robert Lin

commit sha c7b3a445962ec699571c1d4c747d16b24316e886

monitoring: improved notification timings, resolved templates (#12046)

view details

Simon

commit sha 5bff6676c09f6f37f434b55e0455686d4d6fab69

added gql response helpers for integration tests and started blob-viewer integration tests (#12135) * added gql response helpers for integration tests and started blob-viewer tests * removed "only" from test

view details

Farhan Attamimi

commit sha 1a546c86b9c10afb2627c0e1d7deea863d798126

Fix: add missing repogroup query to repogroup page search bar (#12120)

view details

Robert Lin

commit sha 63ebff7d5688868d76eff9e229b048f87b2229de

monitoring: aggregate long-term usage over 7d (#12015)

view details

push time in 22 days

delete branch sourcegraph/deploy-sourcegraph-docker

delete branch : distribution/key_path

delete time in 22 days

push eventsourcegraph/deploy-sourcegraph-docker

davejrt

commit sha 7dafb7f3c35f2bc279760f6fe2f7f07e3eca4221

fix key file name in example (#125)

view details

push time in 22 days

issue closedsourcegraph/sourcegraph

Improve Sourcegraph quickstart guide

IN deploy-sourcegraph-docker/docker-compose/docker-compose.yaml

on line 52, you have the following # - '/LOCAL/KEY/PATH.pem:/sourcegraph.key' Should this be # - '/LOCAL/KEY/PATH.key:/sourcegraph.key'

When I made this change it worked.

closed time in 22 days

dertz

issue commentsourcegraph/sourcegraph

Improve Sourcegraph quickstart guide

@dertz thank for you picking this up and raising the issue

dertz

comment created time in 22 days

PR opened sourcegraph/deploy-sourcegraph-docker

fix key file name in example

closes sourcegraph/sourcegraph#12220

+1 -1

0 comment

1 changed file

pr created time in 22 days

create barnchsourcegraph/deploy-sourcegraph-docker

branch : distribution/key_path

created branch time in 22 days

push eventsourcegraph/sourcegraph

Dave Try

commit sha ee4d85900090516573f6e029c0f1118f9cb452b0

enable reload endpoint for prometheus

view details

push time in 23 days

create barnchsourcegraph/sourcegraph

branch : dave/prom_reload

created branch time in 23 days

PR closed sourcegraph/sourcegraph

enable reload endpoint for prometheus

Enables the reload endpoint on prometheus allowing updates to configfiles (speficically those to configmaps) to be reloaded without killing the prometheus pod.

+1 -0

0 comment

1 changed file

davejrt

pr closed time in 23 days

PR opened sourcegraph/sourcegraph

enable reload endpoint for prometheus

Enables the reload endpoint on prometheus allowing updates to configfiles (speficically those to configmaps) to be reloaded without killing the prometheus pod.

+1 -0

0 comment

1 changed file

pr created time in 23 days

push eventsourcegraph/about

davejrt

commit sha 869776906ea080d0c138d4b960176a4faccae334

adjust heading for pulumi section (#1189)

view details

push time in 25 days

delete branch sourcegraph/about

delete branch : dave/heading_fix

delete time in 25 days

PR merged sourcegraph/about

Reviewers
adjust heading for pulumi section
+1 -1

0 comment

1 changed file

davejrt

pr closed time in 25 days

PR opened sourcegraph/about

Reviewers
adjust heading for pulumi section
+1 -1

0 comment

1 changed file

pr created time in 25 days

create barnchsourcegraph/about

branch : dave/heading_fix

created branch time in 25 days

push eventdavejrt/k9s

Flare576

commit sha cc19f6da7e1d2492f1d54adf8ddfb81f198798e3

feat(helm): add helm purge plugin Add plugin yaml and kubectl plugin 618

view details

derailed

commit sha 037d6d3f5480a6d698d3f92240f457ee8bfc8153

update deps - fix ns issues

view details

derailed

commit sha 4bd37a492f108055e2670c9e3d1f2d0b17872e50

update shell pod config

view details

derailed

commit sha 175a16cfce8d6188aed2bb7b245075dd4bcf6753

update shell config

view details

derailed

commit sha cb456883ecef2d4d7958ce521b0fb29e8312d61c

checkpoint

view details

derailed

commit sha e181fe859c07623b06625a698d2f3cd0b787fbd7

checkpoint

view details

derailed

commit sha e04d7da461d348c11d8132762d651e2973e72386

add release notes

view details

derailed

commit sha 9f1b099e290f6e73d7dead475b34a180a18eb9a5

Merge branch '5_14_20'

view details

derailed

commit sha b19cc76874bf2a5e8707750ff3ea5b27f9a512a2

cleanse popeye output

view details

derailed

commit sha f0ef39b46c97e7ae059a7a422baabf67d72eaff3

update rel notes

view details

derailed

commit sha 46c2f31249b3b67a16659614bde179c481a547de

fix issues #726 #724 #722 #721 #720

view details

derailed

commit sha 01cdc5b86e12d1be35b3e582c71f0bda97b7f94b

update docs

view details

Fernand Galiana

commit sha 99ad32ba9847d925da5bbe3d2396bdaeb7b67431

Merge pull request #619 from Flare576/master feat(helm): add helm purge plugin

view details

Pavel Tumik

commit sha eebe9e78bb9b40dc887126254a66f107043ba00c

color cpu and memory limit fields in different color if close to threshold

view details

Richard Whitehead

commit sha 9f9812b8976b19e0148302b7bd0e674bc9d5e62c

Add pendColor option for Pending pods

view details

Richard Whitehead

commit sha 237d97e6cba064529f1c9b6841fe9da1631c34f3

Set default Pending color to calming darkorange

view details

sgandon

commit sha 08d4498f86fa2920ce01b6eb6132105ff9afc30d

updated the executable path when build from source

view details

Fernand Galiana

commit sha 8fb73fdcb363991b6a1407a0adca9946224a17af

Merge pull request #736 from sgandon/patch-1 updated the executable path when build from source

view details

Fernand Galiana

commit sha c75b6abc723c06f409328e3f472d176d61639145

Merge pull request #725 from soupyt/master Add new "pendingColor" option for Pending pods

view details

derailed

commit sha 678143e2a36c824738b53f498d102058dae87a12

add resource ref support for sec, sa and cm

view details

push time in a month

PR merged davejrt/k9s

Updates
+4735 -1314

0 comment

180 changed files

davejrt

pr closed time in a month

PR opened davejrt/k9s

Updates
+4735 -1314

0 comment

180 changed files

pr created time in a month

PR closed derailed/k9s

Updates
+10 -6

0 comment

7 changed files

davejrt

pr closed time in a month

PR opened derailed/k9s

Updates
+10 -6

0 comment

7 changed files

pr created time in a month

push eventdavejrt/k9s

Dave Try

commit sha 2b5657a98dfaa75de5a34a1d9ddf0b8bf1ee4e95

remove replace for now

view details

push time in a month

push eventdavejrt/k9s

Dave Try

commit sha e2a96eccd9d462c4be1e38f799b00b470bd281cd

change delete command

view details

Dave Try

commit sha b2f26ede15355b2b687e6c02374861281c2a954a

update go mod

view details

push time in a month

create barnchdavejrt/k9s

branch : updates

created branch time in a month

push eventsourcegraph/about

davejrt

commit sha e167fd6678492b86f4644a25fd34cd6c350714b0

add instructions on how to scale k8s cluster (#1168)

view details

push time in a month

delete branch sourcegraph/about

delete branch : scale_k8s

delete time in a month

PR merged sourcegraph/about

Reviewers
add instructions on how to scale k8s cluster
+19 -0

0 comment

1 changed file

davejrt

pr closed time in a month

PR opened sourcegraph/about

Reviewers
add instructions on how to scale k8s cluster
+19 -0

0 comment

1 changed file

pr created time in a month

create barnchsourcegraph/about

branch : scale_k8s

created branch time in a month

Pull request review commentsourcegraph/sourcegraph

prometheus: bundle Alertmanager and siteConfig sync

 USER root # Note: This mirrors what we do in e.g. our base alpine image: https://github.com/sourcegraph/sourcegraph/blob/master/docker-images/alpine/Dockerfile#L10-L15 RUN addgroup -g 101 -S sourcegraph && adduser -u 100 -S -G sourcegraph -h /home/sourcegraph sourcegraph RUN mkdir -p /prometheus && chown -R sourcegraph:sourcegraph /prometheus+RUN mkdir -p /alertmanager && chown -R sourcegraph:sourcegraph /alertmanager USER sourcegraph  COPY --from=builder /generated/prometheus/* /sg_config_prometheus/+COPY ./.bin/prom-wrapper /bin/prom-wrapper+COPY ./prometheus.sh /prometheus.sh+COPY ./alertmanager.sh /alertmanager.sh COPY config/*_rules.yml /sg_config_prometheus/ COPY config/prometheus.yml /sg_config_prometheus/+COPY --chown=sourcegraph config/alertmanager.yml /sg_config_prometheus/

See my comment above regarding change permissions on ${PROMETHEUS_DISK}, as well as that removing the user altogher works for Ubuntu. Happy to push my changes to your branch but wanted to call it out first.

bobheadxi

comment created time in a month

Pull request review commentsourcegraph/sourcegraph

prometheus: bundle Alertmanager and siteConfig sync

 PROMETHEUS_DISK="${HOME}/.sourcegraph-dev/data/prometheus" if [ ! -e "${PROMETHEUS_DISK}" ]; then   mkdir -p "${PROMETHEUS_DISK}"
  mkdir -p "${PROMETHEUS_DISK}" && chmod 777 "${PROMETHEUS_DISK}"
bobheadxi

comment created time in a month

Pull request review commentsourcegraph/sourcegraph

prometheus: bundle Alertmanager and siteConfig sync

 USER root # Note: This mirrors what we do in e.g. our base alpine image: https://github.com/sourcegraph/sourcegraph/blob/master/docker-images/alpine/Dockerfile#L10-L15 RUN addgroup -g 101 -S sourcegraph && adduser -u 100 -S -G sourcegraph -h /home/sourcegraph sourcegraph RUN mkdir -p /prometheus && chown -R sourcegraph:sourcegraph /prometheus+RUN mkdir -p /alertmanager && chown -R sourcegraph:sourcegraph /alertmanager USER sourcegraph  COPY --from=builder /generated/prometheus/* /sg_config_prometheus/+COPY ./.bin/prom-wrapper /bin/prom-wrapper+COPY ./prometheus.sh /prometheus.sh+COPY ./alertmanager.sh /alertmanager.sh COPY config/*_rules.yml /sg_config_prometheus/ COPY config/prometheus.yml /sg_config_prometheus/+COPY --chown=sourcegraph config/alertmanager.yml /sg_config_prometheus/

What user does the process run as on Mac OSX? The issue is when running on linux under the UID of the host user, 664 permissions don't allow a user not in the sourcegraph group to write the alertmanager.yml file

bobheadxi

comment created time in a month

startedmicrosoft/terminal

started time in a month

Pull request review commentsourcegraph/sourcegraph

prometheus: bundle Alertmanager and siteConfig sync

 USER root # Note: This mirrors what we do in e.g. our base alpine image: https://github.com/sourcegraph/sourcegraph/blob/master/docker-images/alpine/Dockerfile#L10-L15 RUN addgroup -g 101 -S sourcegraph && adduser -u 100 -S -G sourcegraph -h /home/sourcegraph sourcegraph RUN mkdir -p /prometheus && chown -R sourcegraph:sourcegraph /prometheus+RUN mkdir -p /alertmanager && chown -R sourcegraph:sourcegraph /alertmanager USER sourcegraph  COPY --from=builder /generated/prometheus/* /sg_config_prometheus/+COPY ./.bin/prom-wrapper /bin/prom-wrapper+COPY ./prometheus.sh /prometheus.sh+COPY ./alertmanager.sh /alertmanager.sh COPY config/*_rules.yml /sg_config_prometheus/ COPY config/prometheus.yml /sg_config_prometheus/+COPY --chown=sourcegraph config/alertmanager.yml /sg_config_prometheus/

On ubuntu this is broken unless you run the container as root because any user not in the sourcegraph group doesn't have write permssions. All other config files are owned as root in these directories.

bobheadxi

comment created time in a month

issue closedsourcegraph/sourcegraph

Change PV reclaim policy from "delete" to something else; clarify that --prune can be dangerous

Our default PV reclaim policy is "delete" which means disks are deleted when no longer used.

Additionally, we had a customer modify kubectl-apply-all.sh to specify a single file and so --prune deleted all deployments not specified. This ultimately didn't result in data loss (luckily) but we should carefully make two changes here:

  1. Add a note to https://github.com/sourcegraph/deploy-sourcegraph/blob/master/kubectl-apply-all.sh that --prune is destructive unless you specify the entire base/ directory.
  2. Change our default PV reclaim policy from "delete" to something less destructive.

Customer: https://sourcegraph.slack.com/archives/CJX299FGE/p1593211681018900 https://app.hubspot.com/contacts/2762526/company/578600789/

closed time in a month

slimsag

startedjimeh/git-aware-prompt

started time in a month

issue commentsourcegraph/sourcegraph

Improve reliability of Sourcegraph.com site24x7 ping alerts

Im a bit confused by https://github.com/sourcegraph/sourcegraph/issues/10742 now :smile:. If the tests are failing from all locations, and some of the linked issues like #3909 are actual issues, what is telling us the test if flaky? https://sourcegraph.slack.com/archives/CJX299FGE/p1594032949233500

@pecigonzalo with regard to your question, my understanding is we're not specifically ruling out having an external monitoring tool, nor that there is no value in external testing. The concern thus far as has been that site24x7 has reported a failure, which has either been:

  • ephemeral due to a deployment
  • an issue with cloudflare
  • an issue with site24x7 itself

The comment around "testing the whole internet" as I understand it is, when these alerts are triggered, the root cause is not always our service, and rather some upstream issue which we can't pin point. Hence, all we are doing is creating noise, and saying "there is an issue somewhere on the internet...it isn't us but we don't know here".

My rationale was to improve or add to our additional checks by trying to testing closer to our stack as to where the issue is. For example if we are testing http://sourcegraph-frontend-internal as well as https://sourcegraph.comand only one is failing, we can at least remove one area of doubt and start to look elsewhere. By adding other checks withblackbox exportersuch as thessl-cert` check we can eliminate that as an issue as well.

slimsag

comment created time in a month

issue commentsourcegraph/sourcegraph

Improve reliability of Sourcegraph.com site24x7 ping alerts

@pecigonzalo we do have multiple locations configured in site24x7 so it will only alert when at least 3 of the locations are failing. That being said all the alerts in the past we've seen have all locations failing.

My approach thus far has been to monitor internally on the stack as well testing externally to ensure to determine where along the request chain there might be an issue whether that be outside of our cluster(ie cloudflare) or internally in the cluster.

slimsag

comment created time in a month

issue closedsourcegraph/sourcegraph

Set up $BIGCUSTOMER replica with TLS and DNS

Use cloudflare infront of $BIGCUSTOMER replica for TLS and DNS in order to ensure it's ready for any testing

closed time in a month

davejrt

push eventsourcegraph/deploy-sourcegraph

davejrt

commit sha 37b460ac3e31a20b138e5849c0f4be35f8b88bac

update advice for --prune flag (#783) * update advice for --prune flag Co-authored-by: uwedeportivo <534011+uwedeportivo@users.noreply.github.com>

view details

push time in a month

delete branch sourcegraph/deploy-sourcegraph

delete branch : davejrt/doc_prune

delete time in a month

PR merged sourcegraph/deploy-sourcegraph

update advice for --prune flag

Add note to advise users that --prune is destructive and should always be applied to base directory. Address concern in sourcegraph/sourcegraph#11913

+5 -1

0 comment

1 changed file

davejrt

pr closed time in a month

push eventsourcegraph/sourcegraph

davejrt

commit sha e2a724b0bfb635340fb78f0a8ce0abed85235987

update storageclass to retain volumes by default (#11938)

view details

push time in a month

delete branch sourcegraph/sourcegraph

delete branch : davejrt/doc_storageclass

delete time in a month

PR merged sourcegraph/sourcegraph

update storageclass to retain volumes by default

Address concern #11913 related to default reclaimPolicy being set to delete by default

+14 -0

0 comment

1 changed file

davejrt

pr closed time in a month

push eventsourcegraph/deploy-sourcegraph

davejrt

commit sha 79b40756d4c366b5434b0cedff2cc4e85581e4e8

Update kubectl-apply-all.sh Co-authored-by: uwedeportivo <534011+uwedeportivo@users.noreply.github.com>

view details

push time in a month

push eventsourcegraph/deploy-sourcegraph

davejrt

commit sha db0e6792208654d484f57d731f3bb947f566eb10

Update kubectl-apply-all.sh Co-authored-by: uwedeportivo <534011+uwedeportivo@users.noreply.github.com>

view details

push time in a month

Pull request review commentsourcegraph/sourcegraph

update storageclass to retain volumes by default

 Sourcegraph expects there to be storage class named `sourcegraph` that it uses f  Create `base/sourcegraph.StorageClass.yaml` with the appropriate configuration for your cloud provider and commit the file to your fork. +The sourceraph storageclass will retain any persistent volumes created in the event of an accidental deletion of a persistent volume claim. ++This cannot be changed once the storage class has been created. Persistent volumes not created with the reclaimPolicy set to `true` can be patched with the following command:

Nope, fixed

davejrt

comment created time in a month

push eventsourcegraph/sourcegraph

Dave Try

commit sha f0cb1343803ba2aced0b1592fbb472b147b57ce9

update storageclass to retain volumes by default

view details

push time in a month

push eventsourcegraph/sourcegraph

davejrt

commit sha f4219e728aa328f36b4ee2f87d3528738e82c905

Update doc/admin/install/kubernetes/configure.md Co-authored-by: uwedeportivo <534011+uwedeportivo@users.noreply.github.com>

view details

push time in a month

PR opened sourcegraph/deploy-sourcegraph

update advice for --prune flag

Add note to advise users that --prune is destructive and should always be applied to base directory. Address concern in @sourcegraph/sourcegraph#11913

+5 -1

0 comment

1 changed file

pr created time in a month

create barnchsourcegraph/deploy-sourcegraph

branch : davejrt/doc_prune

created branch time in a month

PR opened sourcegraph/sourcegraph

update storageclass to retain volumes by default

Address concern #11913 related to default reclaimPolicy being set to delete by default

+14 -0

0 comment

1 changed file

pr created time in a month

create barnchsourcegraph/sourcegraph

branch : davejrt/doc_storageclass

created branch time in a month

issue commentsourcegraph/sourcegraph

Change PV reclaim policy from "delete" to something else; clarify that --prune can be dangerous

Our default PV reclaim policy is "delete" which means disks are deleted when no longer used.

This should be set as part of the storage class like so:

. You can't specify the reclaimPolicy as part of a volumeClaimTemplate in statefulsets and deployments. As we rely a lot of dynamic provisioning I think this makes the most sense in this case. 

slimsag

comment created time in a month

issue commentsourcegraph/sourcegraph

Set up $BIGCUSTOMER replica with TLS and DNS

replica now available at https://bigdata.sgdev.org

davejrt

comment created time in a month

issue openedsourcegraph/sourcegraph

Set up $BIGCUSTOMER replica with TLS and DNS

Use cloudflare infront of $BIGCUSTOMER replica for TLS and DNS in order to ensrue it's ready for any testing

created time in a month

pull request commentsourcegraph/sourcegraph

prometheus: bundle Alertmanager and siteConfig sync

Will take a closer look today and review

bobheadxi

comment created time in a month

issue commentsourcegraph/sourcegraph

Distribution: 3.18 Tracking issue

Things I worked on this week:

  • Supported $CUSTOMER with help from Stephen
  • Worked on #10742 by installing prometheus blackbox exporter and alert manager on an internal replica to help correlate any errors seen by site24.x7

Things I will focus on next week:

  • Continue to focus on $CUSTOMER with Uwe
  • Extend the alerting to cover all cases in site24x7 where possible.
  • Likely pick up a "good first issue" to start levelling up on GO

I will be out 30/6 for a national holiday

slimsag

comment created time in a month

more