Skip to content

Envoy-Proxies ignore updated consul_token #27436

@URZ-HD

Description

@URZ-HD

Nomad version

Nomad v1.10.5

(because we've updated last week to 1.11.1 , i've not the exact build number anymore and have up to now no system for testing it with 1.11.1 - but in the changelogs i saw nothing related to this topic)

Operating system and Environment details

RHEL 8.10

Issue

When restarting the Nomad Process (e.g. after updating the CONSUL_HTTP_TOKEN) nomad looses track on the current consul_tokens used on running envoy-proxy containers.

Normally the consul-token used inside the envoy proxies is the same as configured in nomad configuration. But after updating and restarting nomad, the token in the envoy-proxies is still the "old one".
This means, after a short time, the proxy-containers stop working, because the consul_token expired.

As workaround we have to drain all applications on the nomad hosts, before updating the token and restarting the nomad process.

Reproduction steps

  • Use given jobfile to create a job with consul connection.
  • compare used tokens:
    • token in "/secrets/.envoy_bootrstrap.env" (proxy container)
    • token in "/etc/nomad.d/nomad.env" (host)
    • token in application "/secrets/consul_token" (application container)
  • validating all tokens via "consul acl token read -token $TOKEN -self"

The proxy-container is using the same agent-token as the nomad host. The application container gets a dynamic token via workload identity.

After updating CONSUL_HTTP_TOKEN on host and restarting nomad, the token inside the containers will be updated. But the proxy-container still uses the old consul-token.

Expected Result

After renewal the envoy-proxy container should use the updated consul_token for connections to the service-mesh.

Actual Result

After expiring of the old consul_token (after 1h), the envoy proxy looses connection to the service mesh, and consul logs showing "ACL not found" errors.
Unfortunately the envoy-proxy is also not failing, so the healthchecks will not trigger or restart the proxy.

Job file (if appropriate)

job "httpd-test" {
  group "httpd" {    
    network {
      mode = "bridge"
      port "http" {}
    }
	constraint {
      attribute = "${attr.unique.hostname}"
    	value     = "q10i22"
	}
    service {
      provider = "consul"
      #name     = "playground-httpd"
      port     = "http"
      
      identity {
        aud = ["inf293.consul"]
        ttl = "1h"
      }
      connect {
        sidecar_service {}
      }
    }

    task "httpd" {
      driver = "docker"
      consul {}
      config {
        image   = "busybox:1.36"
        command = "httpd"
        args    = ["-f", "-p", "${NOMAD_PORT_http}"]
        ports   = ["http"]
      }
      
      identity {
        name = "consul_default"
        aud  = ["inf293.consul"]
        ttl  = "1h"
      }
      
 template {
        data        = <<EOF
Consul Services:
{{- range services}}
  * {{.Name}}{{end}}

Consul KV for "httpd/config":
{{- range ls "httpd/config"}}
  * {{.Key}}: {{.Value}}{{end}}
EOF
        destination = "local/consul-info.txt"
    }
    }
  }
}

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Needs Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions