Ask questionsmissing infrastructure to notify us of panics in prod and other deployments
I notice there are panics in our frontend pod logs, and probably in other services, too.
We don't have anything to notify us of these in prod, nor in other deployments. That is a major oversight on our part. We need something to tell us about these.
I don't know who / when is the best time to tackle this, so I am backlogging for now. Any volunteers?
Answer questions bobheadxi
took a quick look at this - there is
defer ... recover but it cannot capture panics in spawned goroutines which kind of renders it useless
an alternative might be to create some kind of wrapper program that can capture the output and update the entrypoints of the various services to use it: https://sourcegraph.com/search?q=repo:%5Egithub.com/sourcegraph/sourcegraph%24+file:%5Ecmd/.*%3F/Dockerfile+ENTRYPOINT&patternType=literal - I'm not too familiar with doing things to command output, so this might be nontrivial.
Aside: from the above query I notice we use
tini, where I found this discussion about cleanup steps: https://github.com/krallin/tini/issues/28 - the idea was shut down, but I imagine a wrapper program would look like some of the scripts people posted in that discussion