profile
viewpoint

Ask questionsTracking Issue: Long tail of monitoring metric improvements

Tracking issue for the long-tail of monitoring metrics improvements.

<!-- BEGIN WORK --> <!-- BEGIN ASSIGNEE: bobheadxi --> @bobheadxi: 2.00d

  • [x] monitoring for CPU and memory usage in Kubernetes #9791 2d <!-- END ASSIGNEE -->

<!-- BEGIN ASSIGNEE: daxmc99 --> @daxmc99: 0.50d

  • [x] Add monitoring for syntax highlighting vs. gitserver perf #10049 0.5d <!-- END ASSIGNEE -->

<!-- BEGIN ASSIGNEE: slimsag --> @slimsag: 9.50d

  • [ ] Grafana: add metric for open file descriptors and alert when approaching fd limit #10009
  • [ ] Disk space of all containers is not monitored #9919 0.5d
  • [ ] Missing monitoring to signal if requests are not hitting search index #9838 0.5d
  • [ ] Alert if the number of repositories not cloned is high #9837 0.5d
  • [ ] observability: src_graphql_field_seconds histogram sometimes produces NaN values #9834 1d 🐛
  • [ ] Unindexed search archive cache is not monitored #9796 0.5d
  • [ ] Alert admins if containers are crashing/restarting #9793
  • [ ] Alert if containers are entirely down/missing #9792
  • [ ] If configuration becomes invalid for any reason, no alert fires for all repositories being removed #9790 0.5d
  • [ ] Missing monitoring to notify admins of mass recloning events #9789 0.5d
  • [ ] observability: gitserver: monitor command execution latency #9788 0.5d
  • [ ] Frontend dashboard: "non-200 indexed search responses every 5m" should be by (category,code) #9744 0.5d 🐛
  • [x] Expose # of code host API requests in Grafana Dashboards #8581 0.5d
  • [x] Ensure defined Grafana alerts do not disappear without data #7623 0.5d
  • [x] Determine missing alerts between alertmanager and grafana #7528 0.5d
  • [ ] Identify further inter-service communication metrics missing from new Grafana dashboards + alerts #7525 1d
  • [x] Identify further service-specific metrics missing from new Grafana dashboards + alerts #7524 1d
  • [ ] Missing monitoring for when gitserver fetches/execs/clones are slow #6675 0.5d
  • [ ] monitoring for how many repos are indexed/nonindexed missing #4197 0.5d
  • [ ] alerting for when external service connections are failing #11595 <!-- END ASSIGNEE --> <!-- END WORK -->

Legend

  • 👩 Customer issue
  • 🐛 Bug
  • 🧶 Technical debt
  • 🛠️ Roadmap
  • 🕵️ Spike
  • 🔒 Security issue
  • :shipit: Pull Request
sourcegraph/sourcegraph

Answer questions slimsag

Closing in favor of individual issues which better communicate priority of each.

useful!

Related questions

Add loki to sourcegraph.com hot 1
Unable to clone GitLab repositories with self signed certificate hot 1
Github User Rank List