K8S Cluster autoscaler crashlooping on EKS
Posted on February 23, 2023 (Last modified on July 2, 2024) • 2 min read • 326 wordsI just fixed a badly crashlooping cluster autoscaler on EKS. Two things I found noteworthy:
failed to list *v1.CSIStorageCapacity
) it might be a
version mismatch:
The autoscaler version must match the k8s version used (e.g. k8s v1.21 requires cluster autoscaler 1.21.x)I’m using the autoscaler helm chart for it, this just needs an annotation:
image:
# IMPORTANT: this value _MUST_ match the k8s version: x.y.(patch)
tag: v1.21.1
Note: You do not have to use the repo mentioned in the original comment.
In my case, I
referenced the wrong ServiceAccount
in my
AWS IRSA role trust policy.
So for “future me”, check this:
ServiceAccount
?ServiceAccount
reference the correct AWS IRSA policy? (typos, wrong policy, …)ServiceAccount
correctly`? (wrong namespace, typos, …)Also, i explicitly name the ServiceAccount
created by the helm chart, so I don’t have to guess what I need to reference from AWS:
rbac:
serviceAccount:
annotations:
# this adds the IRSA reference to the ServiceAccount created
"eks.amazonaws.com/role-arn": "arn:aws:iam::123456789012:role/whatever-your-role-name-is"
# this explicitly controls the name of the ServiceAccount, so we can refernce safely from AWS
name: cluster-autoscaler
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:71 +0xbe
goroutine 297 [sync.Cond.Wait]:
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:310
[...]
created by k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/watch.NewStreamWatcher
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:71 +0xbe
goroutine 299 [sync.Cond.Wait]:
runtime.goparkunlock(...)
/usr/local/go/src/runtime/proc.go:310
[...]
created by k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/watch.NewStreamWatcher
/gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/apimachinery/pkg/watch/streamwatcher.go:71 +0xbe
goroutine 366 [IO wait]:
internal/poll.runtime_pollWait(0x7f2e1059ebb0, 0x72, 0xffffffffffffffff)
/usr/local/go/src/runtime/netpoll.go:203 +0x55
[...]
created by net/http.(*Transport).dialConn
/usr/local/go/src/net/http/transport.go:1706 +0xc56
goroutine 367 [select]:
net/http.(*persistConn).writeLoop(0xc0017e8480)
/usr/local/go/src/net/http/transport.go:2336 +0x11c
created by net/http.(*Transport).dialConn
/usr/local/go/src/net/http/transport.go:1707 +0xc7b
Stream closed EOF for kube-system/cluster-autoscaler-aws-cluster-autoscaler-69f85b8f6d-hbpvv (aws-cluster-autoscaler)
Failed to watch *v1.CSIStorageCapacity: failed to list *v1.CSIStorageCapacity: the server could not find the requested resource