profile
viewpoint

Ask questionsConnection returns TLS / Certificate verification error in proxy after enabling SDS

Connection returns TLS / Certificate verification error in proxy after enabling SDS

Hi, I am having a problem with istio in my current production setup and would need your help to troubleshoot it.

Bug Description

Background

I am running Istio 1.1.7 in all our environments on kubernetes (amazon eks) 1.12.7 with mtls enable on application namespace, sds in both ingress gateway and sidecar. There is no circuit breaker, no custom root CA for citadel.

Problem The behaviour I saw is at first, all services in cluster are working fine, connection from ingress controller hit the services and return correctly.

But after a while, days or weeks, i haven’t been able to find the pattern, all connections from ingress to services return 503 UF, URX. There are logs in istio-proxy container of ingress pod but no log in the upstream service’s istio-proxy container.

In example log (sorry for the format, i pull it out from elasticsearch)

"stream_name": "istio-ingressgateway-76749b4bb4-z6n78",
"istio_policy_status": "-",
"bytes_sent": "91",
"upstream_cluster": "outbound|8080||frontend.services.svc.cluster.local",
"downstream_remote_address": "172.23.24.174:30690",
"path": "/user",
"authority": "prod.example.com",
"protocol": "HTTP/1.1",
"upstream_service_time": "-",
"upstream_local_address": "-",
"duration": "69",
"downstream_local_address": "172.23.24.189:443",
"response_code": "503",
"user_agent": "Mozilla/5.0 (Linux; Android 8.0.0) ...",
"response_flags": "UF,URX",
"start_time": "2019-06-03T13:26:06.617Z",
"method": "GET",
"request_id": "320037db-601b-9c52-861f-bwoeifwoiegi",
"upstream_host": "172.23.24.143:80",
"x_forwarded_for": "218.186.146.112,172.23.24.174",
"requested_server_name": "prod.example.com",
"bytes_received": "0",

I tried to enable debug logging in proxy sidecar with

curl -XPOST localhost:15000/logging?connection=debug

then i found this in the isito-proxy container of the ingress controller:

[2019-05-21 08:18:36.878][33][debug][connection] [external/envoy/source/common/network/connection_impl.cc:644] [C79846] connecting to 172.23.14.229:80
[2019-05-21 08:18:36.878][33][debug][connection] [external/envoy/source/common/network/connection_impl.cc:517] [C79846] connected
[2019-05-21 08:18:36.878][33][debug][connection] [external/envoy/source/common/network/connection_impl.cc:653] [C79846] connection in progress
[2019-05-21 08:18:36.878][33][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:142] [C79846] handshake error: 2
[2019-05-21 08:18:36.883][33][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:142] [C79846] handshake error: 2
[2019-05-21 08:18:36.883][33][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:142] [C79846] handshake error: 2
[2019-05-21 08:18:36.885][33][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:142] [C79846] handshake error: 1
[2019-05-21 08:18:36.885][33][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:175] [C79846] TLS error: 268436501:SSL routines:OPENSSL_internal:SSLV3_ALERT_CERTIFICATE_EXPIRED
[2019-05-21 08:18:36.885][33][debug][connection] [external/envoy/source/common/network/connection_impl.cc:183] [C79846] closing socket: 0

So it looks like there are some problem with the TLS cert. The cert in istio-ca-secret and istio.istio-ingressgateway-service-account look correct and are not expired yet. Same goes for the internal certificates for my upstream services.

And as far as I can tell, this only happens when the service pods runs for a few days without being restarted or deployed with a new version.

I also saw another instance of the problem, but these logs were found inside the upstream service’s istio-proxy container, and the TLS error is different from the one in the ingress controller:

[2019-06-04 01:18:58.029][32][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:142] [C400] handshake error: 2
[2019-06-04 01:18:58.029][32][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:142] [C400] handshake error: 2
[2019-06-04 01:18:58.031][32][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:142] [C400] handshake error: 1
[2019-06-04 01:18:58.031][32][debug][connection] [external/envoy/source/extensions/transport_sockets/tls/ssl_socket.cc:175] [C400] TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED
[2019-06-04 01:18:58.031][32][debug][connection] [external/envoy/source/common/network/connection_impl.cc:183] [C400] closing socket: 0

I am not sure what actually happened here; The citadel logs, node agent logs and the rest looked normal at that point in time.

Please let me know if there are any other logs/config you need to troubleshoot the problem.

Affected product area (please put an X in all that apply)

[ ] Configuration Infrastructure [ ] Docs [ ] Installation [X] Networking [ ] Performance and Scalability [ ] Policies and Telemetry [X] Security [ ] Test and Release [ ] User Experience [ ] Developer Infrastrcture

Steps to reproduce the bug

As mentioned previously, we still don't know how exactly to reproduce it, but based these are the patterns we observed so far:

  • Enable SDS
  • Deploy applications and let them run for 2 - 4 days without any re-deployment / bouncing
  • Observe for 503 errors on the proxy logs

Version (include the output of istioctl version --remote and kubectl version) Istio 1.1.7 in all our environments Kubernetes (AWS EKS) 1.12.7

How was Istio installed? We installed it using Helm, via Tiller

istio/istio

Answer questions Janesee3

Sorry for the misleading logs, it seems that the "Failed to get root cert" logs only appeared once in one of our environment. For all of our other environments, we observed the same SSL VERIFY / CERT EXPIRED error but the node agent logs are normal (example shown below):

June 17th 2019, 02:05:59.124	2019-06-16T18:05:58.268516Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.38~my-application-748c7dcb77-6tgls.mynamespace~services.svc.cluster.local-35"
June 17th 2019, 02:05:58.128	2019-06-16T18:05:57.732654Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.71~my-application-two-7f857ffbd4-pgqtd.mynamespace~services.svc.cluster.local-23"
June 17th 2019, 02:05:58.127	2019-06-16T18:05:57.626791Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.13~my-application-three-5b87bc9565-899wh.mynamespace~services.svc.cluster.local-26"
June 17th 2019, 02:05:58.124	2019-06-16T18:05:57.503437Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.5~my-application-three-5b87bc9565-ldnrw.mynamespace~services.svc.cluster.local-33"
June 17th 2019, 02:05:57.135	2019-06-16T18:05:56.698009Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.71~my-application-two-7f857ffbd4-pgqtd.mynamespace~services.svc.cluster.local-25"
June 17th 2019, 02:05:57.132	2019-06-16T18:05:56.421798Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.38~my-application-748c7dcb77-6tgls.mynamespace~services.svc.cluster.local-35"
June 17th 2019, 02:05:57.128	2019-06-16T18:05:56.282202Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.27~my-application-748c7dcb77-j85g2.mynamespace~services.svc.cluster.local-19"
June 17th 2019, 02:05:57.125	2019-06-16T18:05:56.261387Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.5~my-application-three-5b87bc9565-ldnrw.mynamespace~services.svc.cluster.local-33"
June 17th 2019, 02:05:56.134	2019-06-16T18:05:55.996587Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.5~my-application-three-5b87bc9565-ldnrw.mynamespace~services.svc.cluster.local-32"
June 17th 2019, 02:05:56.133	2019-06-16T18:05:55.946790Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.13~my-application-three-5b87bc9565-899wh.mynamespace~services.svc.cluster.local-26"
June 17th 2019, 02:05:56.132	2019-06-16T18:05:55.903269Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.71~my-application-two-7f857ffbd4-pgqtd.mynamespace~services.svc.cluster.local-23"
June 17th 2019, 02:05:56.127	2019-06-16T18:05:55.773862Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.38~my-application-748c7dcb77-6tgls.mynamespace~services.svc.cluster.local-36"
June 17th 2019, 02:05:55.132	2019-06-16T18:05:54.654639Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.27~my-application-748c7dcb77-j85g2.mynamespace~services.svc.cluster.local-19"
June 17th 2019, 02:05:55.129	2019-06-16T18:05:54.326042Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.5~my-application-three-5b87bc9565-ldnrw.mynamespace~services.svc.cluster.local-32"
June 17th 2019, 02:05:55.126	2019-06-16T18:05:54.256648Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.38~my-application-748c7dcb77-6tgls.mynamespace~services.svc.cluster.local-36"
June 17th 2019, 01:55:55.125	2019-06-16T17:55:54.341694Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.27~my-application-748c7dcb77-j85g2.mynamespace~services.svc.cluster.local-21"
June 17th 2019, 01:55:55.124	2019-06-16T17:55:54.162717Z	info	SDS: push key/cert pair from node agent to proxy: "router~172.23.15.80~custom-ingressgateway-d47657b85-5w29p.istio-system~istio-system.svc.cluster.local-12"
June 17th 2019, 01:55:54.131	2019-06-16T17:55:53.957186Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.13~my-application-three-5b87bc9565-899wh.mynamespace~services.svc.cluster.local-28"
June 17th 2019, 01:55:54.129	2019-06-16T17:55:53.836438Z	info	SDS: push key/cert pair from node agent to proxy: "router~172.23.15.80~custom-ingressgateway-d47657b85-5w29p.istio-system~istio-system.svc.cluster.local-12"
June 17th 2019, 01:55:54.125	2019-06-16T17:55:53.802276Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.27~my-application-748c7dcb77-j85g2.mynamespace~services.svc.cluster.local-21"
June 17th 2019, 01:45:55.126	2019-06-16T17:45:54.575620Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.120~sleep-fdc95b5cc-k92l5.istio-system~istio-system.svc.cluster.local-17"
June 17th 2019, 01:45:55.125	2019-06-16T17:45:54.385893Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.120~sleep-fdc95b5cc-k92l5.istio-system~istio-system.svc.cluster.local-15"
June 17th 2019, 01:45:55.124	2019-06-16T17:45:54.340313Z	info	SDS: push key/cert pair from node agent to proxy: "router~172.23.15.80~custom-ingressgateway-d47657b85-5w29p.istio-system~istio-system.svc.cluster.local-13"
June 17th 2019, 01:45:54.180	2019-06-16T17:45:54.096419Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.120~sleep-fdc95b5cc-k92l5.istio-system~istio-system.svc.cluster.local-17"
June 17th 2019, 01:45:54.177	2019-06-16T17:45:53.846360Z	info	SDS: push key/cert pair from node agent to proxy: "sidecar~172.23.15.120~sleep-fdc95b5cc-k92l5.istio-system~istio-system.svc.cluster.local-15"
useful!
source:https://uonfu.com/
Github User Rank List