profile
viewpoint

Ask questions500 error on badly cleaned up containers/workers

Bug Report

I restarted all of my Concourse nodes to apply v5.0.1, but did it badly, and ended up just SIGKILL'ing everything (I'm sorry!). Now that stuff is rebooted, almost every resource is causing this error in Concourse:

Mar 25 17:53:54 web1.concourse.stm.inf.demilletech.net concourse[12561]: {"timestamp":"2019-03-25T21:53:54.670443499Z","level":"error","source":"atc","message":"atc.pipelines.radar.failed-to-run-scan-resource","data":{"error":"Backend error: Exit status: 500, message: {\"Type\":\"\",\"Message\":\"exit status 2\",\"Handle\":\"\",\"ProcessID\":\"\",\"Binary\":\"\"}\n","pipeline":"api-server","session":"18.5","team":"isoscribe"}}

Steps to Reproduce

  1. Create a pipeline
  2. Make it do stuff
  3. Sigkill everything that has to do with the workers
  4. Start back up the nodes

Expected Results

Not this

Actual Results

This

Version Info

  • Concourse version: 5.0.1
  • Deployment type (BOSH/Docker/binary): Binary
  • Infrastructure/IaaS: VMware
  • Browser (if applicable): Chrome
  • Did this used to work? Yes
concourse/concourse

Answer questions enugentdt

I just realized I attached the wrong logs... 🤦‍♂️ I'm sorry about that. Here are the real logs:

Mar 25 18:03:55 worker3 concourse[23925]: {"timestamp":"2019-03-25T22:03:55.940735079Z","level":"info","source":"guardian","message":"guardian.list-containers.starting","data":{"session":"2448"}}
Mar 25 18:03:55 worker3 concourse[23925]: {"timestamp":"2019-03-25T22:03:55.941216346Z","level":"info","source":"guardian","message":"guardian.list-containers.finished","data":{"session":"2448"}}
Mar 25 18:03:55 worker3 concourse[23925]: {"timestamp":"2019-03-25T22:03:55.951915374Z","level":"info","source":"guardian","message":"guardian.api.garden-server.get-properties.got-properties","data":{"handle":"ca6c871f-0c4d-47fd-452d-d76a058d4e3d","session":"3.1.3535"}}
Mar 25 18:03:55 worker3 concourse[23925]: {"timestamp":"2019-03-25T22:03:55.954830475Z","level":"info","source":"guardian","message":"guardian.run.started","data":{"handle":"ca6c871f-0c4d-47fd-452d-d76a058d4e3d","path":"/opt/resource/check","session":"2449"}}
Mar 25 18:03:55 worker3 concourse[23925]: {"timestamp":"2019-03-25T22:03:55.954890434Z","level":"info","source":"guardian","message":"guardian.run.exec.start","data":{"handle":"ca6c871f-0c4d-47fd-452d-d76a058d4e3d","path":"/opt/resource/check","session":"2449.2"}}
Mar 25 18:03:55 worker3 concourse[23925]: {"timestamp":"2019-03-25T22:03:55.968447637Z","level":"error","source":"guardian","message":"guardian.run.exec.create-workdir-failed","data":{"error":"exit status 2","handle":"ca6c871f-0c4d-47fd-452d-d76a058d4e3d","path":"/opt/resource/check","session":"2449.2"}}
Mar 25 18:03:55 worker3 concourse[23925]: {"timestamp":"2019-03-25T22:03:55.968511963Z","level":"info","source":"guardian","message":"guardian.run.exec.finished","data":{"handle":"ca6c871f-0c4d-47fd-452d-d76a058d4e3d","path":"/opt/resource/check","session":"2449.2"}}
Mar 25 18:03:55 worker3 concourse[23925]: {"timestamp":"2019-03-25T22:03:55.968531217Z","level":"info","source":"guardian","message":"guardian.run.finished","data":{"handle":"ca6c871f-0c4d-47fd-452d-d76a058d4e3d","path":"/opt/resource/check","session":"2449"}}
Mar 25 18:03:55 worker3 concourse[23925]: {"timestamp":"2019-03-25T22:03:55.968550597Z","level":"error","source":"guardian","message":"guardian.api.garden-server.run.failed","data":{"error":"exit status 2","handle":"ca6c871f-0c4d-47fd-452d-d76a058d4e3d","session":"3.1.3536"}}

What was interesting is that it exited with "exit status 2." My guess is that yeah, #3079 would fix this. It looks like it expects a container to exist, runs a command on the "existing" container, and fails for obvious reasons.

Sorry again about attaching the wrong logs!

useful!
source:https://uonfu.com/
answerer
Eamonn Nugent enugentdt @demilletech https://demilletech.net CTO and software developer for demilleTech
Github User Rank List