profile
viewpoint
Stephen Gutekanst slimsag Sourcegraph Arizona https://twitter.com/slimsag Developer @sourcegraph; "Crazy cat lady"; Aspiring gamedev; Faulty musician.

gopherjs/vecty 1742

Vecty: your frontend, in Go

slimsag/cgo-batching 9

CGO call batching benchmark

slimsag/cp 3

Chipmunk 2D physics wrapper for Go.

slimsag/appdash 2

Application tracing system for Go, based on Google's Dapper

slimsag/binpack 2

A Go implementation of Jake Gordon's 2D binpacking algorithm.

slimsag/file2go 2

file2go - A simple tool to convert a binary file to an []byte for Go.

slimsag/darfree 1

Uses black magic to release memory on Darwin.

hexops/syndex 0

syndex performs language analysis via syntax-highlighting trees

rohanpai/shields 0

Shields badge specification, website and default API server

slimsag/archiver 0

Easily create and extract .zip, .tar.gz, .rar (extract-only), and .tar.bz2 files with Go

issue openedsourcegraph/sourcegraph

Change PV reclaim policy from "delete" to something else; clarify that --prune is dangerous

Our default PV reclaim policy is "delete" which means disks are deleted when no longer used.

Additionally, we had a customer modify kubectl-apply-all.sh to specify a single file and so --prune deleted all deployments not specified. This ultimately didn't result in data loss (luckily) but we should carefully make two changes here:

  1. Add a note to https://github.com/sourcegraph/deploy-sourcegraph/blob/master/kubectl-apply-all.sh that --prune is destructive unless you specify the entire base/ directory.
  2. Change our default PV reclaim policy from "delete" to something less destructive.

Customer: https://sourcegraph.slack.com/archives/CJX299FGE/p1593211681018900

created time in 9 hours

push eventsourcegraph/sourcegraph

Chayim

commit sha a3219aeb912e46516b6879ee3001294d5db29059

Permissions issue fix with the build scripts. (#11908) * Permissions issue fix with the build scripts. The mounted docker trees were running as root (at least on arch) leading to start.sh failing to start grafana and prometheus.

view details

push time in 10 hours

delete branch sourcegraph/sourcegraph

delete branch : ck-permissions-fix

delete time in 10 hours

PR merged sourcegraph/sourcegraph

Reviewers
Permissions issue fix with the build scripts.

Docker references directories within the home directory, but as directories don't exist prior to docker run calls, the directories are created and owned by root. As the dockers themselves run as the current UID, that means permission denials exist all over these place, and the docker's cannot start.

Now, we create the directories in advance, prior to the docker call, thereby ensuring permissions to write exist for the logged in user.

+8 -0

3 comments

2 changed files

chayim

pr closed time in 10 hours

push eventsourcegraph/sourcegraph

Stephen Gutekanst

commit sha 352a08be575e157d34935a3885dfd194df5816f8

make shellcheck happy

view details

push time in 10 hours

push eventsourcegraph/sourcegraph

Stephen Gutekanst

commit sha b9c554a0fa1e76366b8db262a2bf9028d20ac265

make shellcheck happy

view details

push time in 10 hours

Pull request review commentsourcegraph/sourcegraph

Permissions issue fix with the build scripts.

 set -euf -o pipefail pushd "$(dirname "${BASH_SOURCE[0]}")/.." >/dev/null  PROMETHEUS_DISK="${HOME}/.sourcegraph-dev/data/prometheus"+if [ ! -e "${PROMETHEUS_DISK}" ]; then+  mkdir -p ${PROMETHEUS_DISK}
  mkdir -p "${PROMETHEUS_DISK}"
chayim

comment created time in 10 hours

Pull request review commentsourcegraph/sourcegraph

Permissions issue fix with the build scripts.

 set -euf -o pipefail pushd "$(dirname "${BASH_SOURCE[0]}")/.." >/dev/null  GRAFANA_DISK="${HOME}/.sourcegraph-dev/data/grafana"+if [ ! -e "${GRAFANA_DISK}" ]; then+  mkdir -p ${GRAFANA_DISK}
  mkdir -p "${GRAFANA_DISK}"
chayim

comment created time in 10 hours

push eventsourcegraph/sourcegraph

Stephen Gutekanst

commit sha d2b9b3660091765cc0c92ec3c69c7c599f7b8ee5

Update dev/prometheus.sh

view details

push time in 11 hours

push eventsourcegraph/sourcegraph

Stephen Gutekanst

commit sha dcdb75dd409837da70ebf4fff2aeb41091590ccf

Update dev/grafana.sh

view details

push time in 11 hours

Pull request review commentsourcegraph/sourcegraph

Permissions issue fix with the build scripts.

 set -euf -o pipefail pushd "$(dirname "${BASH_SOURCE[0]}")/.." >/dev/null  PROMETHEUS_DISK="${HOME}/.sourcegraph-dev/data/prometheus"+if [ ! -e "${PROMETHEUS_DISK}" ]; then+    mkdir -p ${PROMETHEUS_DISK}
  mkdir -p ${PROMETHEUS_DISK}
chayim

comment created time in 11 hours

Pull request review commentsourcegraph/sourcegraph

Permissions issue fix with the build scripts.

 set -euf -o pipefail pushd "$(dirname "${BASH_SOURCE[0]}")/.." >/dev/null  GRAFANA_DISK="${HOME}/.sourcegraph-dev/data/grafana"+if [ ! -e "${GRAFANA_DISK}" ]; then+    mkdir -p ${GRAFANA_DISK}
  mkdir -p ${GRAFANA_DISK}
chayim

comment created time in 11 hours

pull request commentsourcegraph/sourcegraph

Permissions issue fix with the build scripts.

It's not arch-specific, Docker -v does not create missing directories but, a little ironically, only on Linux (technically it is a "bug" that it does it on Mac and Windows). If that seems a bit non-sensical, I would agree :) but it's a feature not a bug apparently

chayim

comment created time in 11 hours

push eventsourcegraph/sourcegraph

Stephen Gutekanst

commit sha 682f4ff8ea1db5c0d97c39f7159915e70005ef0a

troubleshooting: add steps for creating metrics dump from Docker Compose deployments (#11912)

view details

push time in 11 hours

delete branch sourcegraph/sourcegraph

delete branch : sg/dc-metrics-dump

delete time in 11 hours

PR opened sourcegraph/sourcegraph

Reviewers
troubleshooting: add steps for creating metrics dump from Docker Comp…

…ose deployments

<!-- Reminder: Have you updated the changelog and relevant docs (user docs, architecture diagram, etc) ? -->

+36 -0

0 comment

1 changed file

pr created time in 11 hours

create barnchsourcegraph/sourcegraph

branch : sg/dc-metrics-dump

created branch time in 11 hours

issue closedsourcegraph/sourcegraph

Make it possible to enable telemetry on non-Sourcegraph.com instances

When we test feature-flagged features in dogfood, we should have the ability to track their usage

closed time in 13 hours

felixfbecker

Pull request review commentsourcegraph/about

distribution roadmap

 We want to move to a world where all of Sourcegraphs' internal infrastructure is - Discussions: none - Dependencies: none +### Push site admins to use Docker Compose or Kubernetes for production deployments++Many customers of Sourcegraph today are still running a single-container `sourcegraph/server` deployment in production. We recently began advising all new deployments that this deployment option is _not_ for production use because it has no proper resource isolation and as such when it falls over it is impossible to debug, leading to painstakingly urgent migrations to better deployment types and frustrated/angry customers. We would like to get to a world where all production instances of Sourcegraph are Docker Compose or Kubernetes only.

Yes, exactly de-prioritizing sourcegraph/server. This could either be in the form of "it is only for demo deployments" or in the form of "we replace it entirely with Docker Compose and a one-liner install script"

The effort needed by engineering to support it (both by you in https://github.com/sourcegraph/sourcegraph/issues/11473 and others like code-intel team) because it's such an odd deployment model is an impact I will clarify here.

slimsag

comment created time in 16 hours

Pull request review commentsourcegraph/about

distribution roadmap

+# Distribution product roadmap++This living document is the product roadmap for the Distribution team.++It is longer-term than our quarterly OKRs, and higher-level than our GitHub issues. Additionally, it documents dependencies of roadmap items and current owners.++## Ordered & prioritized roadmap++1. (Q1 2020) [Support customers in deploying Sourcegraph with 500k+ repositories](support-customers-in-deploying-sourcegraph-with-500k-repositories)+1. (Q1 2020) [Kubernetes upgrades should have less merge conflicts](#kubernetes-upgrades-should-have-less-merge-conflicts)+1. (Q1 2020) [Docker-compose is not released on time (on the 20th)](#docker-compose-is-not-released-on-time-on-the-20th)+1. (Q1 2020) [Releasing Sourcegraph should be automated](#releasing-sourcegraph-should-be-automated)+1. (Q1 2020) [Automatic e2e testing](#automatic-e2e-testing)+1. (TBD) [Automatic Docker image testing](#automatic-docker-image-testing)+1. (TBD) [Upgrades across multiple Sourcegraph versions should be easier](#upgrades-across-multiple-sourcegraph-versions-should-be-easier)+1. (TBD) [Sourcegraph should be released daily](#sourcegraph-should-be-released-daily)+1. (TBD) [All site admins should have alerting set up to be notified when Sourcegraph is unhealthy](#all-site-admins-should-have-alerting-set-up-to-be-notified-when-sourcegraph-is-unhealthy)+1. (TBD) [Push site admins to use Docker Compose or Kubernetes for production deployments](#push-site-admins-to-use-docker-compose-or-kubernetes-for-production-deployments)+1. (TBD) [Add monitoring for common critical issues](#add-monitoring-for-common-critical-issues)+1. (TBD) [Monitoring federation](#monitoring-federation)+1. (TBD) [GitOps for all internal infrastructure](#gitops-for-all-internal-infrastructure)++## Details (unordered)++### Support customers in deploying Sourcegraph with 500k+ repositories++We have had customers interested in deploying Sourcegraph at large-scale with ~500k+ repositories and will need to dedicate time to supporting them and making their trials go smoothly.++- Owner: Uwe and Dave+- Status: unplanned -> added unexpectedly to Q1 -> in-progress+- [Tracking issue](https://github.com/sourcegraph/customer/issues/57)+- Discussions: [Initial planning issue](https://github.com/sourcegraph/customer/issues/57), [discussion about costs at this scale](https://github.com/sourcegraph/customer/issues/20)+- Dependencies: none++### Kubernetes upgrades should have less merge conflicts++Kubernetes upgrades involve a large number of merge conflicts today which are extremely time consuming and tedious for customers to resolve, preventing them from upgrading as frequently as they should be and creating a large and painful support burden for us.++- Owner: Geoffrey+- Status: planned for Q1 -> in-progress+- [Tracking issue](https://github.com/orgs/sourcegraph/projects/68?card_filter_query=label%3Arfc-141)+- Discussions: none+- Dependencies: none++### CI infrastructure that can run Docker containers in a reliable way++Today we cannot release of Sourcegraph, run e2e tests, or perform Docker image tests in an automated fashion because our CI infrastructure does not support running Docker containers (or VMs/Vagrant) in a reliable way. Today we have a [side-car DIND container in our CI pipeline](https://sourcegraph.sgdev.org/search?q=repo:%5Esourcegraph/infrastructure%24+dind&patternType=literal) but it is flaky, unreliable, and a regular source of issues which has led to us removing automated testing (see [Automatic e2e testing](#automatic-e2e-testing)).++- Owner: Stephen+- Status: planning soon+- [Tracking issue](https://github.com/sourcegraph/sourcegraph/issues/6887)+- Discussions: none+- Dependencies: none++### Releasing Sourcegraph should be automated++A monthly release takes ~2 days of a developers' time, a patch release requires ~3 hours. We want to reduce that substantially both in order to reduce the time we must invest each month, and to increase the release cadence of Sourcegraph substantially.++- Owner: Stephen+- Status: planned for Q1 -> delayed+- [Tracking issue](https://github.com/sourcegraph/sourcegraph/issues/9252)+- Discussions: none+- Dependencies:+  - [CI infrastructure should support running e2e tests in a reliable way](ci-infrastructure-should-support-running-e2e-tests-in-a-reliable-way)++### Automatic e2e testing++[RFC 137](https://docs.google.com/document/d/14f7lwfToeT6t_vxnGsCuXqf3QcB5GRZ2Zoy6kYqBAIQ/edit#heading=h.trqab8y0kufp) saw us remove our e2e tests from CI entirely because it was unreliable. Now e2e tests are ran as part of our monthly release process completely manually, and are heavily broken/outdated each time we attempt to do it. Fixing them often takes ~1.5d of work from a developer on the team. Per the RFC, we want to run these e2e tests on CI in an automated and reliable fashion.++- [Tracking issue](https://github.com/sourcegraph/sourcegraph/issues/10646)+- Discussions: [RFC 137](https://docs.google.com/document/d/14f7lwfToeT6t_vxnGsCuXqf3QcB5GRZ2Zoy6kYqBAIQ/edit#heading=h.trqab8y0kufp)+- Owner: Uwe+- Status: planned for Q1 -> delayed+- Dependencies:+  - [CI infrastructure that can run Docker containers in a reliable way](#ci-infrastructure-that-can-run-docker-containers-in-a-reliable-way)++### Automatic Docker image testing++Customers rely on several important nuanced details about our Docker images:++- Do upgrades and downgrades work as expected / per our documented policies?+- Is the docker-compose.yml valid and does each service come up healthy?+- Are UID/GIDs properly assigned and static?+- Are all images versioned alongside Sourcegraph properly (and are the image tags correct)?++Testing these cases manually as we do today means it is easy to get things wrong and customer upgrades will go poorly, increasing our support burden substantially. Additionally, testing and accounting for these factors manually today slows down the release process and our ability to iterate quickly. We want to automate testing of all these factors.++- Owner: Stephen+- Status: not planned+- [Tracking issue](https://github.com/sourcegraph/sourcegraph/issues/10646)+- Discussions: none+- Dependencies:+  - [CI infrastructure that can run Docker containers in a reliable way](#ci-infrastructure-that-can-run-docker-containers-in-a-reliable-way)++### Docker-compose is not released on time (on the 20th)++Docker-compose deployments were never properly integrated into our release process, and as such it has been released late by about one week after the official release date of 20th when the blog post announcement goes live. This has happened for the past ~4 releases. This has been a recurring problem for customers ("I can't upgrade it doesn't seem to be released"), which has increased support load, and has been a recurring worry from the CE team and others ("can I tell this customer to upgrade to fix their issue yet?").++- Owner: Stephen+- Status: planned for Q1 -> delayed -> overdue+- [Tracking issue](https://github.com/sourcegraph/sourcegraph/issues/10486)+- Discussions: none+- Dependencies: none++### Upgrades across multiple Sourcegraph versions should be easier++Upgrading from 3.13 -> 3.17 requires you perform 4 individual upgrades today (3.14 -> 3.15 -> 3.16 -> 3.17) which is extremely painful and time consuming for site admins, especially when one must also address the merge conflicts that occur on each upgrade. We would like to make upgrades across multiple Sourcegraph versions easier.++- Owner: none+- Status: not planned+- [Tracking issue](https://github.com/sourcegraph/sourcegraph/issues/10144)+- Discussions: none+- Dependencies: none++### Sourcegraph should be released daily

Clarify impacts this has on the rest of the engineering org

slimsag

comment created time in 16 hours

Pull request review commentsourcegraph/about

distribution roadmap

+# Distribution product roadmap++This living document is the product roadmap for the Distribution team.++It is longer-term than our quarterly OKRs, and higher-level than our GitHub issues. Additionally, it documents dependencies of roadmap items and current owners.++## Ordered & prioritized roadmap++1. (Q1 2020) [Support customers in deploying Sourcegraph with 500k+ repositories](support-customers-in-deploying-sourcegraph-with-500k-repositories)+1. (Q1 2020) [Kubernetes upgrades should have less merge conflicts](#kubernetes-upgrades-should-have-less-merge-conflicts)+1. (Q1 2020) [Docker-compose is not released on time (on the 20th)](#docker-compose-is-not-released-on-time-on-the-20th)+1. (Q1 2020) [Releasing Sourcegraph should be automated](#releasing-sourcegraph-should-be-automated)+1. (Q1 2020) [Automatic e2e testing](#automatic-e2e-testing)+1. (TBD) [Automatic Docker image testing](#automatic-docker-image-testing)+1. (TBD) [Upgrades across multiple Sourcegraph versions should be easier](#upgrades-across-multiple-sourcegraph-versions-should-be-easier)+1. (TBD) [Sourcegraph should be released daily](#sourcegraph-should-be-released-daily)+1. (TBD) [All site admins should have alerting set up to be notified when Sourcegraph is unhealthy](#all-site-admins-should-have-alerting-set-up-to-be-notified-when-sourcegraph-is-unhealthy)+1. (TBD) [Push site admins to use Docker Compose or Kubernetes for production deployments](#push-site-admins-to-use-docker-compose-or-kubernetes-for-production-deployments)+1. (TBD) [Add monitoring for common critical issues](#add-monitoring-for-common-critical-issues)+1. (TBD) [Monitoring federation](#monitoring-federation)+1. (TBD) [GitOps for all internal infrastructure](#gitops-for-all-internal-infrastructure)++## Details (unordered)++### Support customers in deploying Sourcegraph with 500k+ repositories++We have had customers interested in deploying Sourcegraph at large-scale with ~500k+ repositories and will need to dedicate time to supporting them and making their trials go smoothly.++- Owner: Uwe and Dave+- Status: unplanned -> added unexpectedly to Q1 -> in-progress+- [Tracking issue](https://github.com/sourcegraph/customer/issues/57)+- Discussions: [Initial planning issue](https://github.com/sourcegraph/customer/issues/57), [discussion about costs at this scale](https://github.com/sourcegraph/customer/issues/20)+- Dependencies: none++### Kubernetes upgrades should have less merge conflicts++Kubernetes upgrades involve a large number of merge conflicts today which are extremely time consuming and tedious for customers to resolve, preventing them from upgrading as frequently as they should be and creating a large and painful support burden for us.++- Owner: Geoffrey+- Status: planned for Q1 -> in-progress+- [Tracking issue](https://github.com/orgs/sourcegraph/projects/68?card_filter_query=label%3Arfc-141)+- Discussions: none+- Dependencies: none++### CI infrastructure that can run Docker containers in a reliable way++Today we cannot release of Sourcegraph, run e2e tests, or perform Docker image tests in an automated fashion because our CI infrastructure does not support running Docker containers (or VMs/Vagrant) in a reliable way. Today we have a [side-car DIND container in our CI pipeline](https://sourcegraph.sgdev.org/search?q=repo:%5Esourcegraph/infrastructure%24+dind&patternType=literal) but it is flaky, unreliable, and a regular source of issues which has led to us removing automated testing (see [Automatic e2e testing](#automatic-e2e-testing)).++- Owner: Stephen+- Status: planning soon+- [Tracking issue](https://github.com/sourcegraph/sourcegraph/issues/6887)+- Discussions: none+- Dependencies: none++### Releasing Sourcegraph should be automated++A monthly release takes ~2 days of a developers' time, a patch release requires ~3 hours. We want to reduce that substantially both in order to reduce the time we must invest each month, and to increase the release cadence of Sourcegraph substantially.++- Owner: Stephen+- Status: planned for Q1 -> delayed+- [Tracking issue](https://github.com/sourcegraph/sourcegraph/issues/9252)+- Discussions: none+- Dependencies:+  - [CI infrastructure should support running e2e tests in a reliable way](ci-infrastructure-should-support-running-e2e-tests-in-a-reliable-way)++### Automatic e2e testing++[RFC 137](https://docs.google.com/document/d/14f7lwfToeT6t_vxnGsCuXqf3QcB5GRZ2Zoy6kYqBAIQ/edit#heading=h.trqab8y0kufp) saw us remove our e2e tests from CI entirely because it was unreliable. Now e2e tests are ran as part of our monthly release process completely manually, and are heavily broken/outdated each time we attempt to do it. Fixing them often takes ~1.5d of work from a developer on the team. Per the RFC, we want to run these e2e tests on CI in an automated and reliable fashion.++- [Tracking issue](https://github.com/sourcegraph/sourcegraph/issues/10646)+- Discussions: [RFC 137](https://docs.google.com/document/d/14f7lwfToeT6t_vxnGsCuXqf3QcB5GRZ2Zoy6kYqBAIQ/edit#heading=h.trqab8y0kufp)+- Owner: Uwe+- Status: planned for Q1 -> delayed+- Dependencies:+  - [CI infrastructure that can run Docker containers in a reliable way](#ci-infrastructure-that-can-run-docker-containers-in-a-reliable-way)++### Automatic Docker image testing++Customers rely on several important nuanced details about our Docker images:++- Do upgrades and downgrades work as expected / per our documented policies?+- Is the docker-compose.yml valid and does each service come up healthy?+- Are UID/GIDs properly assigned and static?+- Are all images versioned alongside Sourcegraph properly (and are the image tags correct)?++Testing these cases manually as we do today means it is easy to get things wrong and customer upgrades will go poorly, increasing our support burden substantially. Additionally, testing and accounting for these factors manually today slows down the release process and our ability to iterate quickly. We want to automate testing of all these factors.++- Owner: Stephen+- Status: not planned+- [Tracking issue](https://github.com/sourcegraph/sourcegraph/issues/10646)+- Discussions: none+- Dependencies:+  - [CI infrastructure that can run Docker containers in a reliable way](#ci-infrastructure-that-can-run-docker-containers-in-a-reliable-way)++### Docker-compose is not released on time (on the 20th)++Docker-compose deployments were never properly integrated into our release process, and as such it has been released late by about one week after the official release date of 20th when the blog post announcement goes live. This has happened for the past ~4 releases. This has been a recurring problem for customers ("I can't upgrade it doesn't seem to be released"), which has increased support load, and has been a recurring worry from the CE team and others ("can I tell this customer to upgrade to fix their issue yet?").++- Owner: Stephen+- Status: planned for Q1 -> delayed -> overdue+- [Tracking issue](https://github.com/sourcegraph/sourcegraph/issues/10486)+- Discussions: none+- Dependencies: none++### Upgrades across multiple Sourcegraph versions should be easier

Clarify impacts this has on the rest of the engineering org

slimsag

comment created time in 16 hours

Pull request review commentsourcegraph/about

distribution roadmap

+# Distribution product roadmap++This living document is the product roadmap for the Distribution team.++It is longer-term than our quarterly OKRs, and higher-level than our GitHub issues. Additionally, it documents dependencies of roadmap items and current owners.++## Ordered & prioritized roadmap++1. (Q1 2020) [Support customers in deploying Sourcegraph with 500k+ repositories](support-customers-in-deploying-sourcegraph-with-500k-repositories)+1. (Q1 2020) [Kubernetes upgrades should have less merge conflicts](#kubernetes-upgrades-should-have-less-merge-conflicts)+1. (Q1 2020) [Docker-compose is not released on time (on the 20th)](#docker-compose-is-not-released-on-time-on-the-20th)+1. (Q1 2020) [Releasing Sourcegraph should be automated](#releasing-sourcegraph-should-be-automated)+1. (Q1 2020) [Automatic e2e testing](#automatic-e2e-testing)+1. (TBD) [Automatic Docker image testing](#automatic-docker-image-testing)+1. (TBD) [Upgrades across multiple Sourcegraph versions should be easier](#upgrades-across-multiple-sourcegraph-versions-should-be-easier)+1. (TBD) [Sourcegraph should be released daily](#sourcegraph-should-be-released-daily)+1. (TBD) [All site admins should have alerting set up to be notified when Sourcegraph is unhealthy](#all-site-admins-should-have-alerting-set-up-to-be-notified-when-sourcegraph-is-unhealthy)+1. (TBD) [Push site admins to use Docker Compose or Kubernetes for production deployments](#push-site-admins-to-use-docker-compose-or-kubernetes-for-production-deployments)+1. (TBD) [Add monitoring for common critical issues](#add-monitoring-for-common-critical-issues)+1. (TBD) [Monitoring federation](#monitoring-federation)+1. (TBD) [GitOps for all internal infrastructure](#gitops-for-all-internal-infrastructure)++## Details (unordered)++### Support customers in deploying Sourcegraph with 500k+ repositories++We have had customers interested in deploying Sourcegraph at large-scale with ~500k+ repositories and will need to dedicate time to supporting them and making their trials go smoothly.++- Owner: Uwe and Dave+- Status: unplanned -> added unexpectedly to Q1 -> in-progress+- [Tracking issue](https://github.com/sourcegraph/customer/issues/57)+- Discussions: [Initial planning issue](https://github.com/sourcegraph/customer/issues/57), [discussion about costs at this scale](https://github.com/sourcegraph/customer/issues/20)+- Dependencies: none++### Kubernetes upgrades should have less merge conflicts++Kubernetes upgrades involve a large number of merge conflicts today which are extremely time consuming and tedious for customers to resolve, preventing them from upgrading as frequently as they should be and creating a large and painful support burden for us.++- Owner: Geoffrey+- Status: planned for Q1 -> in-progress+- [Tracking issue](https://github.com/orgs/sourcegraph/projects/68?card_filter_query=label%3Arfc-141)+- Discussions: none+- Dependencies: none++### CI infrastructure that can run Docker containers in a reliable way++Today we cannot release of Sourcegraph, run e2e tests, or perform Docker image tests in an automated fashion because our CI infrastructure does not support running Docker containers (or VMs/Vagrant) in a reliable way. Today we have a [side-car DIND container in our CI pipeline](https://sourcegraph.sgdev.org/search?q=repo:%5Esourcegraph/infrastructure%24+dind&patternType=literal) but it is flaky, unreliable, and a regular source of issues which has led to us removing automated testing (see [Automatic e2e testing](#automatic-e2e-testing)).++- Owner: Stephen+- Status: planning soon+- [Tracking issue](https://github.com/sourcegraph/sourcegraph/issues/6887)+- Discussions: none+- Dependencies: none++### Releasing Sourcegraph should be automated++A monthly release takes ~2 days of a developers' time, a patch release requires ~3 hours. We want to reduce that substantially both in order to reduce the time we must invest each month, and to increase the release cadence of Sourcegraph substantially.++- Owner: Stephen+- Status: planned for Q1 -> delayed+- [Tracking issue](https://github.com/sourcegraph/sourcegraph/issues/9252)+- Discussions: none+- Dependencies:+  - [CI infrastructure should support running e2e tests in a reliable way](ci-infrastructure-should-support-running-e2e-tests-in-a-reliable-way)++### Automatic e2e testing++[RFC 137](https://docs.google.com/document/d/14f7lwfToeT6t_vxnGsCuXqf3QcB5GRZ2Zoy6kYqBAIQ/edit#heading=h.trqab8y0kufp) saw us remove our e2e tests from CI entirely because it was unreliable. Now e2e tests are ran as part of our monthly release process completely manually, and are heavily broken/outdated each time we attempt to do it. Fixing them often takes ~1.5d of work from a developer on the team. Per the RFC, we want to run these e2e tests on CI in an automated and reliable fashion.++- [Tracking issue](https://github.com/sourcegraph/sourcegraph/issues/10646)+- Discussions: [RFC 137](https://docs.google.com/document/d/14f7lwfToeT6t_vxnGsCuXqf3QcB5GRZ2Zoy6kYqBAIQ/edit#heading=h.trqab8y0kufp)+- Owner: Uwe+- Status: planned for Q1 -> delayed+- Dependencies:+  - [CI infrastructure that can run Docker containers in a reliable way](#ci-infrastructure-that-can-run-docker-containers-in-a-reliable-way)++### Automatic Docker image testing++Customers rely on several important nuanced details about our Docker images:++- Do upgrades and downgrades work as expected / per our documented policies?+- Is the docker-compose.yml valid and does each service come up healthy?+- Are UID/GIDs properly assigned and static?+- Are all images versioned alongside Sourcegraph properly (and are the image tags correct)?++Testing these cases manually as we do today means it is easy to get things wrong and customer upgrades will go poorly, increasing our support burden substantially. Additionally, testing and accounting for these factors manually today slows down the release process and our ability to iterate quickly. We want to automate testing of all these factors.++- Owner: Stephen+- Status: not planned+- [Tracking issue](https://github.com/sourcegraph/sourcegraph/issues/10646)+- Discussions: none+- Dependencies:+  - [CI infrastructure that can run Docker containers in a reliable way](#ci-infrastructure-that-can-run-docker-containers-in-a-reliable-way)++### Docker-compose is not released on time (on the 20th)

Clarify that is a small undertaking

slimsag

comment created time in 17 hours

Pull request review commentsourcegraph/about

distribution roadmap

+# Distribution product roadmap++This living document is the product roadmap for the Distribution team.++It is longer-term than our quarterly OKRs, and higher-level than our GitHub issues. Additionally, it documents dependencies of roadmap items and current owners.++## Ordered & prioritized roadmap++1. (Q1 2020) [Support customers in deploying Sourcegraph with 500k+ repositories](support-customers-in-deploying-sourcegraph-with-500k-repositories)+1. (Q1 2020) [Kubernetes upgrades should have less merge conflicts](#kubernetes-upgrades-should-have-less-merge-conflicts)+1. (Q1 2020) [Docker-compose is not released on time (on the 20th)](#docker-compose-is-not-released-on-time-on-the-20th)+1. (Q1 2020) [Releasing Sourcegraph should be automated](#releasing-sourcegraph-should-be-automated)+1. (Q1 2020) [Automatic e2e testing](#automatic-e2e-testing)+1. (TBD) [Automatic Docker image testing](#automatic-docker-image-testing)+1. (TBD) [Upgrades across multiple Sourcegraph versions should be easier](#upgrades-across-multiple-sourcegraph-versions-should-be-easier)+1. (TBD) [Sourcegraph should be released daily](#sourcegraph-should-be-released-daily)+1. (TBD) [All site admins should have alerting set up to be notified when Sourcegraph is unhealthy](#all-site-admins-should-have-alerting-set-up-to-be-notified-when-sourcegraph-is-unhealthy)+1. (TBD) [Push site admins to use Docker Compose or Kubernetes for production deployments](#push-site-admins-to-use-docker-compose-or-kubernetes-for-production-deployments)+1. (TBD) [Add monitoring for common critical issues](#add-monitoring-for-common-critical-issues)+1. (TBD) [Monitoring federation](#monitoring-federation)+1. (TBD) [GitOps for all internal infrastructure](#gitops-for-all-internal-infrastructure)++## Details (unordered)++### Support customers in deploying Sourcegraph with 500k+ repositories++We have had customers interested in deploying Sourcegraph at large-scale with ~500k+ repositories and will need to dedicate time to supporting them and making their trials go smoothly.++- Owner: Uwe and Dave+- Status: unplanned -> added unexpectedly to Q1 -> in-progress+- [Tracking issue](https://github.com/sourcegraph/customer/issues/57)+- Discussions: [Initial planning issue](https://github.com/sourcegraph/customer/issues/57), [discussion about costs at this scale](https://github.com/sourcegraph/customer/issues/20)+- Dependencies: none++### Kubernetes upgrades should have less merge conflicts++Kubernetes upgrades involve a large number of merge conflicts today which are extremely time consuming and tedious for customers to resolve, preventing them from upgrading as frequently as they should be and creating a large and painful support burden for us.++- Owner: Geoffrey+- Status: planned for Q1 -> in-progress+- [Tracking issue](https://github.com/orgs/sourcegraph/projects/68?card_filter_query=label%3Arfc-141)+- Discussions: none+- Dependencies: none++### CI infrastructure that can run Docker containers in a reliable way++Today we cannot release of Sourcegraph, run e2e tests, or perform Docker image tests in an automated fashion because our CI infrastructure does not support running Docker containers (or VMs/Vagrant) in a reliable way. Today we have a [side-car DIND container in our CI pipeline](https://sourcegraph.sgdev.org/search?q=repo:%5Esourcegraph/infrastructure%24+dind&patternType=literal) but it is flaky, unreliable, and a regular source of issues which has led to us removing automated testing (see [Automatic e2e testing](#automatic-e2e-testing)).++- Owner: Stephen+- Status: planning soon+- [Tracking issue](https://github.com/sourcegraph/sourcegraph/issues/6887)+- Discussions: none+- Dependencies: none++### Releasing Sourcegraph should be automated++A monthly release takes ~2 days of a developers' time, a patch release requires ~3 hours. We want to reduce that substantially both in order to reduce the time we must invest each month, and to increase the release cadence of Sourcegraph substantially.++- Owner: Stephen+- Status: planned for Q1 -> delayed+- [Tracking issue](https://github.com/sourcegraph/sourcegraph/issues/9252)+- Discussions: none+- Dependencies:+  - [CI infrastructure should support running e2e tests in a reliable way](ci-infrastructure-should-support-running-e2e-tests-in-a-reliable-way)++### Automatic e2e testing++[RFC 137](https://docs.google.com/document/d/14f7lwfToeT6t_vxnGsCuXqf3QcB5GRZ2Zoy6kYqBAIQ/edit#heading=h.trqab8y0kufp) saw us remove our e2e tests from CI entirely because it was unreliable. Now e2e tests are ran as part of our monthly release process completely manually, and are heavily broken/outdated each time we attempt to do it. Fixing them often takes ~1.5d of work from a developer on the team. Per the RFC, we want to run these e2e tests on CI in an automated and reliable fashion.++- [Tracking issue](https://github.com/sourcegraph/sourcegraph/issues/10646)+- Discussions: [RFC 137](https://docs.google.com/document/d/14f7lwfToeT6t_vxnGsCuXqf3QcB5GRZ2Zoy6kYqBAIQ/edit#heading=h.trqab8y0kufp)+- Owner: Uwe+- Status: planned for Q1 -> delayed+- Dependencies:+  - [CI infrastructure that can run Docker containers in a reliable way](#ci-infrastructure-that-can-run-docker-containers-in-a-reliable-way)++### Automatic Docker image testing++Customers rely on several important nuanced details about our Docker images:++- Do upgrades and downgrades work as expected / per our documented policies?+- Is the docker-compose.yml valid and does each service come up healthy?+- Are UID/GIDs properly assigned and static?+- Are all images versioned alongside Sourcegraph properly (and are the image tags correct)?

Enumerate supported customer configuration

slimsag

comment created time in 17 hours

Pull request review commentsourcegraph/about

distribution roadmap

+# Distribution product roadmap++This living document is the product roadmap for the Distribution team.++It is longer-term than our quarterly OKRs, and higher-level than our GitHub issues. Additionally, it documents dependencies of roadmap items and current owners.++## Ordered & prioritized roadmap++1. (Q1 2020) [Support customers in deploying Sourcegraph with 500k+ repositories](support-customers-in-deploying-sourcegraph-with-500k-repositories)+1. (Q1 2020) [Kubernetes upgrades should have less merge conflicts](#kubernetes-upgrades-should-have-less-merge-conflicts)+1. (Q1 2020) [Docker-compose is not released on time (on the 20th)](#docker-compose-is-not-released-on-time-on-the-20th)+1. (Q1 2020) [Releasing Sourcegraph should be automated](#releasing-sourcegraph-should-be-automated)+1. (Q1 2020) [Automatic e2e testing](#automatic-e2e-testing)+1. (TBD) [Automatic Docker image testing](#automatic-docker-image-testing)+1. (TBD) [Upgrades across multiple Sourcegraph versions should be easier](#upgrades-across-multiple-sourcegraph-versions-should-be-easier)+1. (TBD) [Sourcegraph should be released daily](#sourcegraph-should-be-released-daily)+1. (TBD) [All site admins should have alerting set up to be notified when Sourcegraph is unhealthy](#all-site-admins-should-have-alerting-set-up-to-be-notified-when-sourcegraph-is-unhealthy)+1. (TBD) [Push site admins to use Docker Compose or Kubernetes for production deployments](#push-site-admins-to-use-docker-compose-or-kubernetes-for-production-deployments)+1. (TBD) [Add monitoring for common critical issues](#add-monitoring-for-common-critical-issues)+1. (TBD) [Monitoring federation](#monitoring-federation)+1. (TBD) [GitOps for all internal infrastructure](#gitops-for-all-internal-infrastructure)++## Details (unordered)++### Support customers in deploying Sourcegraph with 500k+ repositories++We have had customers interested in deploying Sourcegraph at large-scale with ~500k+ repositories and will need to dedicate time to supporting them and making their trials go smoothly.++- Owner: Uwe and Dave+- Status: unplanned -> added unexpectedly to Q1 -> in-progress+- [Tracking issue](https://github.com/sourcegraph/customer/issues/57)+- Discussions: [Initial planning issue](https://github.com/sourcegraph/customer/issues/57), [discussion about costs at this scale](https://github.com/sourcegraph/customer/issues/20)+- Dependencies: none++### Kubernetes upgrades should have less merge conflicts++Kubernetes upgrades involve a large number of merge conflicts today which are extremely time consuming and tedious for customers to resolve, preventing them from upgrading as frequently as they should be and creating a large and painful support burden for us.++- Owner: Geoffrey+- Status: planned for Q1 -> in-progress+- [Tracking issue](https://github.com/orgs/sourcegraph/projects/68?card_filter_query=label%3Arfc-141)+- Discussions: none+- Dependencies: none++### CI infrastructure that can run Docker containers in a reliable way++Today we cannot release of Sourcegraph, run e2e tests, or perform Docker image tests in an automated fashion because our CI infrastructure does not support running Docker containers (or VMs/Vagrant) in a reliable way. Today we have a [side-car DIND container in our CI pipeline](https://sourcegraph.sgdev.org/search?q=repo:%5Esourcegraph/infrastructure%24+dind&patternType=literal) but it is flaky, unreliable, and a regular source of issues which has led to us removing automated testing (see [Automatic e2e testing](#automatic-e2e-testing)).++- Owner: Stephen+- Status: planning soon+- [Tracking issue](https://github.com/sourcegraph/sourcegraph/issues/6887)+- Discussions: none+- Dependencies: none++### Releasing Sourcegraph should be automated

This should be #2 priority; should clarify that e2e and Docker image testing is dependency

slimsag

comment created time in 17 hours

Pull request review commentsourcegraph/about

distribution roadmap

+# Distribution product roadmap++This living document is the product roadmap for the Distribution team.++It is longer-term than our quarterly OKRs, and higher-level than our GitHub issues. Additionally, it documents dependencies of roadmap items and current owners.++## Ordered & prioritized roadmap++1. (Q1 2020) [Support customers in deploying Sourcegraph with 500k+ repositories](support-customers-in-deploying-sourcegraph-with-500k-repositories)+1. (Q1 2020) [Kubernetes upgrades should have less merge conflicts](#kubernetes-upgrades-should-have-less-merge-conflicts)+1. (Q1 2020) [Docker-compose is not released on time (on the 20th)](#docker-compose-is-not-released-on-time-on-the-20th)+1. (Q1 2020) [Releasing Sourcegraph should be automated](#releasing-sourcegraph-should-be-automated)+1. (Q1 2020) [Automatic e2e testing](#automatic-e2e-testing)+1. (TBD) [Automatic Docker image testing](#automatic-docker-image-testing)+1. (TBD) [Upgrades across multiple Sourcegraph versions should be easier](#upgrades-across-multiple-sourcegraph-versions-should-be-easier)+1. (TBD) [Sourcegraph should be released daily](#sourcegraph-should-be-released-daily)+1. (TBD) [All site admins should have alerting set up to be notified when Sourcegraph is unhealthy](#all-site-admins-should-have-alerting-set-up-to-be-notified-when-sourcegraph-is-unhealthy)+1. (TBD) [Push site admins to use Docker Compose or Kubernetes for production deployments](#push-site-admins-to-use-docker-compose-or-kubernetes-for-production-deployments)+1. (TBD) [Add monitoring for common critical issues](#add-monitoring-for-common-critical-issues)+1. (TBD) [Monitoring federation](#monitoring-federation)+1. (TBD) [GitOps for all internal infrastructure](#gitops-for-all-internal-infrastructure)++## Details (unordered)++### Support customers in deploying Sourcegraph with 500k+ repositories++We have had customers interested in deploying Sourcegraph at large-scale with ~500k+ repositories and will need to dedicate time to supporting them and making their trials go smoothly.++- Owner: Uwe and Dave+- Status: unplanned -> added unexpectedly to Q1 -> in-progress+- [Tracking issue](https://github.com/sourcegraph/customer/issues/57)+- Discussions: [Initial planning issue](https://github.com/sourcegraph/customer/issues/57), [discussion about costs at this scale](https://github.com/sourcegraph/customer/issues/20)+- Dependencies: none++### Kubernetes upgrades should have less merge conflicts++Kubernetes upgrades involve a large number of merge conflicts today which are extremely time consuming and tedious for customers to resolve, preventing them from upgrading as frequently as they should be and creating a large and painful support burden for us.++- Owner: Geoffrey+- Status: planned for Q1 -> in-progress+- [Tracking issue](https://github.com/orgs/sourcegraph/projects/68?card_filter_query=label%3Arfc-141)+- Discussions: none+- Dependencies: none++### CI infrastructure that can run Docker containers in a reliable way++Today we cannot release of Sourcegraph, run e2e tests, or perform Docker image tests in an automated fashion because our CI infrastructure does not support running Docker containers (or VMs/Vagrant) in a reliable way. Today we have a [side-car DIND container in our CI pipeline](https://sourcegraph.sgdev.org/search?q=repo:%5Esourcegraph/infrastructure%24+dind&patternType=literal) but it is flaky, unreliable, and a regular source of issues which has led to us removing automated testing (see [Automatic e2e testing](#automatic-e2e-testing)).++- Owner: Stephen+- Status: planning soon+- [Tracking issue](https://github.com/sourcegraph/sourcegraph/issues/6887)+- Discussions: none+- Dependencies: none++### Releasing Sourcegraph should be automated++A monthly release takes ~2 days of a developers' time, a patch release requires ~3 hours. We want to reduce that substantially both in order to reduce the time we must invest each month, and to increase the release cadence of Sourcegraph substantially.

link to 7148

slimsag

comment created time in 17 hours

Pull request review commentsourcegraph/about

distribution roadmap

+# Distribution product roadmap++This living document is the product roadmap for the Distribution team.++It is longer-term than our quarterly OKRs, and higher-level than our GitHub issues. Additionally, it documents dependencies of roadmap items and current owners.++## Ordered & prioritized roadmap++1. (Q1 2020) [Support customers in deploying Sourcegraph with 500k+ repositories](support-customers-in-deploying-sourcegraph-with-500k-repositories)+1. (Q1 2020) [Kubernetes upgrades should have less merge conflicts](#kubernetes-upgrades-should-have-less-merge-conflicts)+1. (Q1 2020) [Docker-compose is not released on time (on the 20th)](#docker-compose-is-not-released-on-time-on-the-20th)+1. (Q1 2020) [Releasing Sourcegraph should be automated](#releasing-sourcegraph-should-be-automated)+1. (Q1 2020) [Automatic e2e testing](#automatic-e2e-testing)+1. (TBD) [Automatic Docker image testing](#automatic-docker-image-testing)+1. (TBD) [Upgrades across multiple Sourcegraph versions should be easier](#upgrades-across-multiple-sourcegraph-versions-should-be-easier)+1. (TBD) [Sourcegraph should be released daily](#sourcegraph-should-be-released-daily)+1. (TBD) [All site admins should have alerting set up to be notified when Sourcegraph is unhealthy](#all-site-admins-should-have-alerting-set-up-to-be-notified-when-sourcegraph-is-unhealthy)+1. (TBD) [Push site admins to use Docker Compose or Kubernetes for production deployments](#push-site-admins-to-use-docker-compose-or-kubernetes-for-production-deployments)+1. (TBD) [Add monitoring for common critical issues](#add-monitoring-for-common-critical-issues)+1. (TBD) [Monitoring federation](#monitoring-federation)+1. (TBD) [GitOps for all internal infrastructure](#gitops-for-all-internal-infrastructure)++## Details (unordered)++### Support customers in deploying Sourcegraph with 500k+ repositories++We have had customers interested in deploying Sourcegraph at large-scale with ~500k+ repositories and will need to dedicate time to supporting them and making their trials go smoothly.++- Owner: Uwe and Dave+- Status: unplanned -> added unexpectedly to Q1 -> in-progress+- [Tracking issue](https://github.com/sourcegraph/customer/issues/57)+- Discussions: [Initial planning issue](https://github.com/sourcegraph/customer/issues/57), [discussion about costs at this scale](https://github.com/sourcegraph/customer/issues/20)+- Dependencies: none++### Kubernetes upgrades should have less merge conflicts++Kubernetes upgrades involve a large number of merge conflicts today which are extremely time consuming and tedious for customers to resolve, preventing them from upgrading as frequently as they should be and creating a large and painful support burden for us.

Clarify what exact merge conflicts customers experience

slimsag

comment created time in 18 hours

issue commentsourcegraph/sourcegraph

explore solutions for "impossible to silence monitoring alerts"

Does Alertmanager have a web UI with silencing already that we could expose more easily, I wonder?

slimsag

comment created time in a day

issue commentsourcegraph/sourcegraph

sourcegraph-frontend local disk usage grows unboundedly

Got it, thanks a bunch for the explanations and follow up here, I really appreciate it :)

On Wed, Jul 1, 2020 at 11:35 PM Keegan Carruthers-Smith < notifications@github.com> wrote:

Closed #8308 https://github.com/sourcegraph/sourcegraph/issues/8308.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/sourcegraph/sourcegraph/issues/8308#event-3505574203, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYGWOD3MO7YSH33OXI6UKTRZQTDVANCNFSM4KRFAOCA .

-- Follow me on twitter @slimsag https://twitter.com/slimsag.

beyang

comment created time in a day

issue commentsourcegraph/src-cli

Update CI to use Go 1.14

Yes, making sure Windows builds work is important please don’t remove it :)

On Wed, Jul 1, 2020 at 9:39 PM Thorsten Ball notifications@github.com wrote:

I see. And what's the usecase behind Appveyor? Windows builds?

Why couldn't you get Appveyor to build? What's missing?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sourcegraph/src-cli/issues/237#issuecomment-652776947, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYGWOHSIM7TE4JZ4UCG3DDRZQFRNANCNFSM4ONIULXA .

-- Follow me on twitter @slimsag https://twitter.com/slimsag.

ryanslade

comment created time in a day

IssuesEvent

issue commentsourcegraph/sourcegraph

sourcegraph-frontend local disk usage grows unboundedly

@keegancsmith I found this is fixed for some caches:

  • Searcher: https://sourcegraph.com/github.com/sourcegraph/sourcegraph@148d12a2905786670265cfe7e64ba100f84063a2/-/blob/cmd/searcher/main.go#L30
  • Symbols: https://sourcegraph.com/github.com/sourcegraph/sourcegraph@148d12a2905786670265cfe7e64ba100f84063a2/-/blob/cmd/symbols/main.go#L35

But I was not able to find what actually prevents archive cache in frontend from growing unbounded anymore? We still have it here:

  • https://sourcegraph.com/github.com/sourcegraph/sourcegraph@148d12a2905786670265cfe7e64ba100f84063a2/-/blob/cmd/frontend/internal/cli/serve_cmd.go#L59-61

https://github.com/sourcegraph/sourcegraph/pull/11252 appears to be only for the fetching of raw files - but won't most Git operations the frontend performs still involve fetching the archive cache? Or am I mis-remembering the code here?

beyang

comment created time in a day

issue openedsourcegraph/sourcegraph

keep e2e tests in functional state during iteration

Uwe:

e2e tests [...] are being run daily now by buildkite. i want the (non-regression) e2e to be green at least once in a week. i scheduled it to run daily and i will try to fix it up so it doesn’t fall too much behind [...] so dax is not in complete trouble come release time

created time in a day

issue commentsourcegraph/sourcegraph

Diff query for unknown revision is slow

Update: just saw the earlier comments about how high-priority / urgent this is, we will ship this in a patch ASAP (like tonight or tomorrow)

beyang

comment created time in a day

issue openedsourcegraph/sourcegraph

Release patch v3.17.3

A customer P0 requires we performa patch release for the fixes in https://github.com/sourcegraph/sourcegraph/issues/11654 ASAP

created time in a day

Pull request review commentsourcegraph/src-cli

Allow users to supply arbitrary HTTP headers for requests

 All notable changes to `src-cli` are documented in this file.  ### Added +- Add support for `SRC_HEADER_` environment variables.
- `SRC_HEADER_NAME=value` is now supported for authenticating `src` with custom auth proxies. See [auth proxy configuration docs](AUTH_PROXY.md) for more information.
efritz

comment created time in 2 days

Pull request review commentsourcegraph/src-cli

Allow users to supply arbitrary HTTP headers for requests

 Point `src` to your instance and access token using environment variables: SRC_ENDPOINT=https://sourcegraph.example.com SRC_ACCESS_TOKEN="secret" src search 'foobar' ``` +If your instance is behind an auth proxy that requires additional headers, these can be supplied via environment variables:++```sh+SRC_HEADER_NAME=value src search 'foobar'+```++In this example, the header name-value pair `Name: value` will be threaded to all HTTP requests to your instance. Multiple such headers can be supplied.

This should not go in the README, it adds too much bloat. Instead please create a new AUTH_PROXY.md document and add just this to the README:

Sourcegraph behind a custom auth proxy? See auth proxy configuration docs.

efritz

comment created time in 2 days

pull request commentslimsag/update-docker-tags

Add dockerfile and Github action

Super sorry for the late review, will review ASAP.

daxmc99

comment created time in 2 days

pull request commentslimsag/update-docker-tags

Add new regex for matching with all docker tags

Super sorry for the late review, will review ASAP.

daxmc99

comment created time in 2 days

issue commentsourcegraph/sourcegraph

Deploy and release: Replace Renovate with GitHub Actions

He sent me two PRs I need to review here: https://github.com/slimsag/update-docker-tags/pulls

Then we just need to actually use the GitHub action here and fix any remaining issues it may have

Apologies for the delay in reviewing these, will look ASAP

beyang

comment created time in 2 days

pull request commentsourcegraph/sourcegraph

prometheus: bundle Alertmanager and siteConfig sync

@unknwon I would say not necessary as Distribution owns this - but feel free to review if you like as usual of course. Will make sure CODEOWNERS file reflects this.

bobheadxi

comment created time in 2 days

push eventsourcegraph/about

Stephen Gutekanst

commit sha 98b899796ca975d79c4477b291c87bdac690b439

address early feedback from Gonza, Nick; explain why things are important; link to project boards; clarify timeline; add more items

view details

Stephen Gutekanst

commit sha 135f173c7b2b6b4b8111f8f971e1b713ad449240

categorize monitoring plans

view details

push time in 2 days

issue closedsourcegraph/sourcegraph

Sourcegraph.com gitserver echo test duration regularly exceeds several seconds

On Sourcegraph.com the gitserver echo command duration is regularly very abnormally high.

This may be related to https://github.com/sourcegraph/sourcegraph/issues/9355 and may be a partial cause in search performance on Sourcegraph.com being regularly poor https://github.com/sourcegraph/sourcegraph/issues/9359

Over the last 15 days it has peaked over 1s regularly:

image

In stark contrast, on k8s.sgdev.org over the last 15 days this has not once exceeded 125ms:

image

I also checked a customer instance, and similarly there not once above 125ms in the last 15d.

cc @sourcegraph/core-services

closed time in 2 days

slimsag

issue commentsourcegraph/sourcegraph

Sourcegraph.com gitserver echo test duration regularly exceeds several seconds

Indeed, scaling gitserver with 2x as many replicas appears to have solved this:

image

slimsag

comment created time in 2 days

issue closedsourcegraph/sourcegraph

Add prometheus metric that tracks deployment's resource usage

See https://github.com/sourcegraph/sourcegraph/issues/1958#issuecomment-456027014

closed time in 2 days

ggilmore

issue commentsourcegraph/sourcegraph

Add prometheus metric that tracks deployment's resource usage

Robert fixed this via https://github.com/sourcegraph/sourcegraph/issues/7529

ggilmore

comment created time in 2 days

issue commentsourcegraph/sourcegraph

Document extending and overriding NGINX configuration without edits to nginx.conf

Closing until there is more information.

ryan-blunden

comment created time in 2 days

pull request commentsourcegraph/sourcegraph

prometheus: bundle Alertmanager and siteConfig sync

find a way to trigger a Sourcegraph alert to make sure the whole thing works (because the above is manual)

It would be nice to have this happen automatically after saving changes to the alerting configuration - or even just a GraphQL query one can run in e.g. sourcegraph.com/api/console (but neither of this is needed for this PR and I don't want to increase its scope)

bobheadxi

comment created time in 2 days

pull request commentsourcegraph/sourcegraph

prometheus: bundle Alertmanager and siteConfig sync

[...] but I've yet to successfully receive a notification. I'm going this by

It looks like your message got cut off here

bobheadxi

comment created time in 2 days

pull request commentsourcegraph/sourcegraph

prometheus: bundle Alertmanager and siteConfig sync

This is excellent, thanks for the write-up! I will take a look at the implementation and try this out tomorrow.

bobheadxi

comment created time in 2 days

issue commentsourcegraph/sourcegraph

It is not clear when a user needs to move from server -> docker-compose or cluster

See also https://github.com/sourcegraph/sourcegraph/issues/11828

dadlerj

comment created time in 2 days

issue closedsourcegraph/sourcegraph

spike: investigate caddy issues in dev environment

See https://sourcegraph.slack.com/archives/C07KZF47K/p1586860609310400

@ggilmore hasn't able to reproduce this issue, but this is causing pain for some on @sourcegraph/web

Next steps:

  • Ask other team members if they're running into this
  • If no, hop on a call with @felixfbecker to try to reproduce this, options:
    • reproduce this on a clean laptop
    • reproduce this on a extremely minimal example

closed time in 2 days

ggilmore

issue commentsourcegraph/sourcegraph

spike: investigate caddy issues in dev environment

I believe this hasn't been an issue since, so I am going to close this. If I am wrong please re-open.

ggilmore

comment created time in 2 days

issue closedsourcegraph/sourcegraph

placeholder issue

closed time in 2 days

ggilmore

issue commentsourcegraph/sourcegraph

placeholder issue

Closing as these cross-repo issues should show up in the generated issue now

ggilmore

comment created time in 2 days

issue closedsourcegraph/sourcegraph

document testing process for upgrading grafana

Robert notes that upgrading Grafana may involve changes to the underlying config format, which we will need to ensure we respect: https://github.com/sourcegraph/sourcegraph/issues/9983#issuecomment-640764575

Additionally, I note that upgrading Grafana could break generated dashboards if we don't e.g. upgrade the Go generator grafana dependency at the same time: https://github.com/sourcegraph/sourcegraph/issues/9983#issuecomment-640906607

We should put in place a process for testing Grafana upgrades (to start, just some manual documented smoke tests and testing on k8s.sgdev.org/sourcegraph.com would suffice).

closed time in 2 days

slimsag

issue commentsourcegraph/sourcegraph

document testing process for upgrading grafana

https://github.com/sourcegraph/about/pull/1053 sufficiently covers this, we can file follow-up issues as needed or just improve the docs directly.

slimsag

comment created time in 2 days

issue commentsourcegraph/sourcegraph

Tracking Issue: Long tail of monitoring metric improvements

Closing in favor of individual issues which better communicate priority of each.

slimsag

comment created time in 2 days

issue closedsourcegraph/sourcegraph

Tracking Issue: Long tail of monitoring metric improvements

Tracking issue for the long-tail of monitoring metrics improvements.

<!-- BEGIN WORK --> <!-- BEGIN ASSIGNEE: bobheadxi --> @bobheadxi: 2.00d

  • [x] monitoring for CPU and memory usage in Kubernetes #9791 2d <!-- END ASSIGNEE -->

<!-- BEGIN ASSIGNEE: daxmc99 --> @daxmc99: 0.50d

  • [x] Add monitoring for syntax highlighting vs. gitserver perf #10049 0.5d <!-- END ASSIGNEE -->

<!-- BEGIN ASSIGNEE: slimsag --> @slimsag: 9.50d

  • [ ] Grafana: add metric for open file descriptors and alert when approaching fd limit #10009
  • [ ] Disk space of all containers is not monitored #9919 0.5d
  • [ ] Missing monitoring to signal if requests are not hitting search index #9838 0.5d
  • [ ] Alert if the number of repositories not cloned is high #9837 0.5d
  • [ ] observability: src_graphql_field_seconds histogram sometimes produces NaN values #9834 1d 🐛
  • [ ] Unindexed search archive cache is not monitored #9796 0.5d
  • [ ] Alert admins if containers are crashing/restarting #9793
  • [ ] Alert if containers are entirely down/missing #9792
  • [ ] If configuration becomes invalid for any reason, no alert fires for all repositories being removed #9790 0.5d
  • [ ] Missing monitoring to notify admins of mass recloning events #9789 0.5d
  • [ ] observability: gitserver: monitor command execution latency #9788 0.5d
  • [ ] Frontend dashboard: "non-200 indexed search responses every 5m" should be by (category,code) #9744 0.5d 🐛
  • [x] Expose # of code host API requests in Grafana Dashboards #8581 0.5d
  • [x] Ensure defined Grafana alerts do not disappear without data #7623 0.5d
  • [x] Determine missing alerts between alertmanager and grafana #7528 0.5d
  • [ ] Identify further inter-service communication metrics missing from new Grafana dashboards + alerts #7525 1d
  • [x] Identify further service-specific metrics missing from new Grafana dashboards + alerts #7524 1d
  • [ ] Missing monitoring for when gitserver fetches/execs/clones are slow #6675 0.5d
  • [ ] monitoring for how many repos are indexed/nonindexed missing #4197 0.5d
  • [ ] alerting for when external service connections are failing #11595 <!-- END ASSIGNEE --> <!-- END WORK -->

Legend

  • 👩 Customer issue
  • 🐛 Bug
  • 🧶 Technical debt
  • 🛠️ Roadmap
  • 🕵️ Spike
  • 🔒 Security issue
  • :shipit: Pull Request

closed time in 2 days

slimsag

issue closedsourcegraph/sourcegraph

support $CUSTOMER

Time-tracking issue for https://github.com/sourcegraph/customer/issues/57

closed time in 2 days

slimsag

issue commentsourcegraph/sourcegraph

support $CUSTOMER

Closing in favor of https://github.com/sourcegraph/customer/issues/57

slimsag

comment created time in 2 days

issue commentsourcegraph/sourcegraph

redis-exporter image is 32-bit and not open-source

Excellent, thank you for the information!

terinjokes

comment created time in 2 days

issue commentsourcegraph/sourcegraph

redis-exporter image is 32-bit

Super sorry this is happening, we will make this open-source very soon in the docker-images folder of this repo. It's not meant to be private, it just ended up there by accident.

Here are the entire contents of the Dockerfile:

FROM oliver006/redis_exporter:v0.34.1@sha256:4f4420c643e83840753d57b8f7a5afd08b83f1645ca43c60278b75a3907a57a5
RUN addgroup -S redis && adduser -S -G redis -h /home/redis redis
USER redis
terinjokes

comment created time in 2 days

push eventsourcegraph/about

Stephen Gutekanst

commit sha b19b230e110f006d5b91bd0616b5f0e4b55fe1aa

distribution: add "Push site admins to use Docker Compose or Kubernetes for production deployments" to roadmap

view details

push time in 2 days

issue closedsourcegraph/sourcegraph

test bed for $CUSTOMER monorepo

flesh out RFC 129 and implement

closed time in 2 days

uwedeportivo

issue commentsourcegraph/sourcegraph

test bed for $CUSTOMER monorepo

No longer a priority based on https://github.com/sourcegraph/customer/issues/58#issuecomment-630439462 so I am closing.

uwedeportivo

comment created time in 2 days

issue openedsourcegraph/sourcegraph

Push site admins to use Docker Compose or Kubernetes for production deployments

Many users of Sourcegraph today are still using a sourcegraph/server deployment in production.

We have seen that when this single-container deployment type begins to fall over, it falls over hard and in ways that we (nor the customer) can effectively debug. It means that our only recourse is a painstakingly urgent migration to a better deployment type with resource isolation, like Docker Compose. In practice this has led to angry and frustrated customers and the appearance that we don't understand the cause of the issue.

This is all because services are running in the same exact container and competing for resources without any isolation between them. In contrast, Docker Compose and Kubernetes allows us to have resource isolation between services.

This issue is for tracking how we can get to a world where sourcegraph/server is treated purely as a non-production demo deployment type, with something like a banner at the top indicating exactly that.

It is worth noting that:

  1. In our documentation we already explicitly state that single-container deployments are NOT for production deployments.
  2. In server deployments we have a hard warning about performance being due to the deployment type if searches time out and in other conditions.
  3. Several customers are still running server deployments as we have not pushed them to migrate, and have explicitly used some hacks to disable the performance warning banners AND have faced performance issues.

In order to achieve this I believe we will need to:

  1. Reduce the friction of upgrading from server -> Docker Compose and reduce the maintenance and resource burden of Docker Compose deployments compared to server ones (i.e. we need a "small scale docker compose deployment" setup).
  2. Over time increasingly ramp-up pressure on site admins to upgrade to docker-compose or Kubernetes and explain why in clear terms.
  3. Ultimately get to a point where there is a non-dismissible banner at the top of server deployments indicating "this is a demo deployment not for production use" - ramping up to this will need to be done carefully, after emailing and pushing admins, and after several versions of Sourcegraph with increasingly strong wording about the fact that we are going to do this.

created time in 2 days

issue commentsourcegraph/sourcegraph

Set ephemeral storage resource requests and limits in deploy-sourcegraph

https://github.com/sourcegraph/sourcegraph/issues/8308 for next steps here

beyang

comment created time in 2 days

issue commentsourcegraph/sourcegraph

sourcegraph-frontend local disk usage grows unboundedly

This has caused a customer's frontend pods to get evicted and prevents us from adding Kubernetes ephemeral storage limits, see: https://github.com/sourcegraph/sourcegraph/issues/9604

I don't think this is extremely urgent - but could you make sure this is on your roadmap somewhere @sourcegraph/cloud ? (if we can fix in the next 1-3 months I think that would be ideal)

beyang

comment created time in 2 days

push eventsourcegraph/about

Stephen Gutekanst

commit sha ee90a612f508036ff2992f4472c5faee05d0624f

document how to set up Zoom recordings to go to Slack automatically (#1133) * entry * document how to setup Zoom recordings to go to Slack automatically

view details

push time in 2 days

delete branch sourcegraph/about

delete branch : sg/zoom-recordings

delete time in 2 days

PR merged sourcegraph/about

Reviewers
document how to set up Zoom recordings to go to Slack automatically

It's important for remote team members to have access to recordings when they cannot attend a meeting (e.g. a team sync or all-hands company sync). Allowing others to watch these meetings and contribute post-mortem thanks to our strong written culture is incredibly useful.

However, this relies on someone recording the meeting and posting the link properly - and unfortunately Zoom makes this 1000x harder than it should be. I have found the following setup to configure a Slack email relay + gmail forwarding filter extremely useful as it automatically records our Distribution team sync and posts the recording in our Slack channel:

image

I suspect that other teams may find this useful, and it may be useful for the company meeting in particular.

Additionally, because Zoom meetings can be reused at any time, you can cheat by using the same Zoom link as an easy way to start a recorded call and have the result posted to Slack. For example in the Distribution team we have this in Slack:

image

If there is a topic where having a recording makes sense for posterity or other reasons, you can just use that link and it'll post to Slack automatically with the recording (of course this is not a substitute for proper note taking).

+45 -0

0 comment

2 changed files

slimsag

pr closed time in 2 days

Pull request review commentsourcegraph/about

document how to set up Zoom recordings to go to Slack automatically

 - [Continuous releasability](continuous_releasability.md) - [Commit message guidelines](commit_messages.md) - [Ignoring editor config files in Git](ignoring_editor_config_files.md)+- [Configuring Zoom to send recordings to Slack automatically](configuring_zoom_recordings_to_slack_automatically.md)

That would be great to have indeed!

slimsag

comment created time in 2 days

push eventsourcegraph/sourcegraph

Stephen Gutekanst

commit sha b084d7110dd495aca35d12e3e1c600308326db31

Kubernetes: document how to load config from files on disk (#11827) A customer requested details on this and many more have in the past - we should have this fully explained here so I have done exactly that.

view details

push time in 2 days

delete branch sourcegraph/sourcegraph

delete branch : sg/kubernetes

delete time in 2 days

PR merged sourcegraph/sourcegraph

Kubernetes: document how to load config from files on disk

A customer requested details on this and many more have in the past - we should have this fully explained here so I have done exactly that.

+108 -1

0 comment

1 changed file

slimsag

pr closed time in 2 days

PR opened sourcegraph/sourcegraph

Reviewers
Kubernetes: document how to load config from files on disk

A customer requested details on this and many more have in the past - we should have this fully explained here so I have done exactly that.

+108 -1

0 comment

1 changed file

pr created time in 3 days

create barnchsourcegraph/sourcegraph

branch : sg/kubernetes

created branch time in 3 days

issue openedsourcegraph/sourcegraph

Plan 3.19 work ahead of time

Depends on @slimsag completing Distribution product roadmap and giving Gonza enough context/info to do this

created time in 3 days

PR opened sourcegraph/about

Reviewers
document how to set up Zoom recordings to go to Slack automatically

It's important for remote team members to have access to recordings when they cannot attend a meeting (e.g. a team sync or all-hands company sync). Allowing others to watch these meetings and contribute post-mortem thanks to our strong written culture is incredibly useful.

However, this relies on someone recording the meeting and posting the link properly - and unfortunately Zoom makes this 1000x harder than it should be. I have found the following setup to configure a Slack email relay + gmail forwarding filter extremely useful as it automatically records our Distribution team sync and posts the recording in our Slack channel:

image

I suspect that other teams may find this useful, and it may be useful for the company meeting in particular.

Additionally, because Zoom meetings can be reused at any time, you can cheat by using the same Zoom link as an easy way to start a recorded call and have the result posted to Slack. For example in the Distribution team we have this in Slack:

image

If there is a topic where having a recording makes sense for posterity or other reasons, you can just use that link and it'll post to Slack automatically with the recording (of course this is not a substitute for proper note taking).

+45 -0

0 comment

2 changed files

pr created time in 3 days

push eventsourcegraph/about

Stephen Gutekanst

commit sha 4725c61e6f66eaaa2f44ff01814db94f640879eb

document how to setup Zoom recordings to go to Slack automatically

view details

push time in 3 days

create barnchsourcegraph/about

branch : sg/zoom-recordings

created branch time in 3 days

issue commentsourcegraph/sourcegraph

Backup snapshotting of Sourcegraph instance might corrupt .git dirs

As for how this is failing, we can probably file another issue about making it fail nicely. For example if we set GIT_DIR then git won't do the discovery all the way up to /data and instead will fail with a better error message. Additionally we could look into making our janitor job more aggressively detect bad .git dirs. WDYT?

This SGTM

uwedeportivo

comment created time in 3 days

issue commentsourcegraph/sourcegraph

Investigate search performance for some large customers

Closing in favor of the two linked issues instead.

slimsag

comment created time in 3 days

issue closedsourcegraph/sourcegraph

Investigate search performance for some large customers

  • https://github.com/sourcegraph/customer/issues/56
  • https://github.com/sourcegraph/customer/issues/50#issuecomment-628216524

closed time in 3 days

slimsag

issue commentsourcegraph/sourcegraph

Alert if host volumes are inaccessible due to permission errors

This has bitten several customers in the past.

slimsag

comment created time in 3 days

issue closedsourcegraph/sourcegraph

Expose # of code host API requests in Grafana Dashboards

You can access this info today via src_gitlab_requests_total, src_github_requests_total, etc. and it should be exposed as info on the Repo Updater dashboard since this is generally useful for admins to know.

This likely isn't something we can alert on, however (or at least I cannot think of a reasonable way to alert on it?)

Related: https://sourcegraph.slack.com/archives/CSKMGUJ58/p1582577731005800

closed time in 3 days

slimsag

issue commentsourcegraph/sourcegraph

Expose # of code host API requests in Grafana Dashboards

Closing in favor of https://github.com/sourcegraph/sourcegraph/issues/8100

slimsag

comment created time in 3 days

issue commentsourcegraph/sourcegraph

Alerts for code host rate limiting, time spent waiting on rate limiting, etc.

Some additional details in https://github.com/sourcegraph/sourcegraph/issues/8581

slimsag

comment created time in 3 days

issue commentsourcegraph/sourcegraph

Monitoring for code host rate limiting & over-requests & latency is missing

See for example https://github.com/sourcegraph/sourcegraph/issues/7262 which indicates this was happening on Sourcegraph.com for a long period of time completely unnoticed.

slimsag

comment created time in 3 days

issue closedsourcegraph/sourcegraph

Alerts for exceeding github and other code host rate limits

I wanted to add alerts for github rate limiting, but the values reported by github-proxy are pretty errattic:

https://k8s.sgdev.org/-/debug/grafana/explore?orgId=1&left=%5B%22now-7d%22,%22now%22,%22Prometheus%22,%7B%22expr%22:%22sum%20by%20(resource)(src_github_rate_limit_remaining)%22,%22context%22:%22explore%22%7D,%7B%22mode%22:%22Metrics%22%7D,%7B%22ui%22:%5Btrue,true,true,%22none%22%5D%7D%5D

https://sourcegraph.com/-/debug/grafana/explore?orgId=1&left=%5B%22now-7d%22,%22now%22,%22Prometheus%22,%7B%22expr%22:%22sum%20by%20(resource)(src_github_rate_limit_remaining)%22,%22context%22:%22explore%22%7D,%7B%22mode%22:%22Metrics%22%7D,%7B%22ui%22:%5Btrue,true,true,%22none%22%5D%7D%5D

closed time in 3 days

slimsag

issue commentsourcegraph/sourcegraph

Dry-run external service configuration before saving

This seems like some relatively deep repo-updater changes and not in the scope of distribution, please correct me if I am wrong, though. Reassigning to cloud.

keegancsmith

comment created time in 3 days

issue commentsourcegraph/sourcegraph

github rate limits on k8s.sgdev.org and sourcegraph.com are concerning

Looks like this has resolved since I looked last time

slimsag

comment created time in 3 days

issue closedsourcegraph/sourcegraph

Placeholder: Continuous release process

This is a placeholder issue for implementing a continuous release process and adding more release automation.

Notes

  • Merge on test pass
  • Changelog entries from commit message
  • Cut releases (if commit message includes special tag ("release: v3.11.4"))
  • Try GitHub actions to implement the above
  • PR template

Prior work

  • https://github.com/sourcegraph/sourcegraph/issues/2311
  • https://github.com/sourcegraph/sourcegraph/issues/2392
  • https://github.com/sourcegraph/sourcegraph/issues/7148

closed time in 3 days

beyang

issue commentsourcegraph/sourcegraph

Placeholder: Continuous release process

Closing in favor of https://github.com/sourcegraph/sourcegraph/issues/9252

beyang

comment created time in 3 days

issue closedsourcegraph/sourcegraph

Ensure defined Grafana alerts do not disappear without data

The "Total defined alerts" are sometimes not showing up when there is no data, need to add empty vectors so these are always present.

closed time in 3 days

slimsag

issue commentsourcegraph/sourcegraph

Ensure defined Grafana alerts do not disappear without data

Closing in favor of https://github.com/sourcegraph/sourcegraph/issues/11571 which I suspect is the same issue and has more reproducible details.

slimsag

comment created time in 3 days

issue closedsourcegraph/sourcegraph

Renovate PRs are not getting automatically merged

https://github.com/sourcegraph/deploy-sourcegraph-dot-com/pull/1849 was opened and approved 3 hours ago, but is still not merged. You can look at some recent closed PRs to see there is now a high latency here. This might be related to product changes on Renovate's end.

closed time in 3 days

nicksnyder
more