Deploy Cluster Autoscaler for Kubernetes on Jetstream
The Kubernetes Cluster Autoscaler is a service that runs within a Kubernetes cluster and when there are not enough resources to accomodate the pods that are queued to run, it contacts the API of the cloud provider to create more Virtual Machines to join the Kubernetes Cluster.
Initially the Cluster Autoscaler only supported commercial cloud provides, but back in March 2019 a user contributed Openstack support based on Magnum.
First step you should have a Magnum-based deployment running on Jetstream, see my recent tutorial about that.
Therefore you should also have already a copy of the repository of all configuration files checked out on your local machine that you are using to interact with the openstack API, if not:
git clone https://github.com/zonca/jupyterhub-deploy-kubernetes-jetstream.git
and enter the folder dedicated to the autoscaler:
cd jupyterhub-deploy-kubernetes-jetstream/kubernetes_magnum/autoscaler
Setup credentials
We first create the service account needed by the autoscaler to interact with the Kubernetes API:
kubectl create -f cluster-autoscaler-svcaccount.yaml
Then we need to provide all connection details for the autoscaler to interact with the Openstack API,
those are contained in the cloud-config
of our cluster available in the master node and setup
by Magnum.
Get the IP
of your master node from:
openstack server list
IP=xxx.xxx.xxx.xxx
Now ssh into the master node and access the cloud-config
file:
ssh fedora@$IP
cat /etc/kubernetes/cloud-config
now copy the [Global]
section at the end of cluster-autoscaler-secret.yaml
on the local machine.
Also remove the line of ca-file
kubectl create -f cluster-autoscaler-secret.yaml
Launch the Autoscaler deployment
Create the Autoscaler deployment:
kubectl create -f cluster-autoscaler-deployment-master.yaml
Alternatively, I also added a version for a cluster where we are not deploying pods on master cluster-autoscaler-deployment.yaml
.
Check that the deployment is active:
kubectl -n kube-system get pods
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
cluster-autoscaler 1 1 1 0 10s
And check its logs:
kubectl -n kube-system logs cluster-autoscaler-59f4cf4f4-4k4p2
I0905 05:29:21.589062 1 leaderelection.go:217] attempting to acquire leader lease kube-system/cluster-autoscaler...
I0905 05:29:39.412449 1 leaderelection.go:227] successfully acquired lease kube-system/cluster-autoscaler
I0905 05:29:43.896557 1 magnum_manager_heat.go:293] For stack ID 17ab3ae7-1a81-43e6-98ec-b6ffd04f91d3, stack name is k8s-lu3bksbwsln3
I0905 05:29:44.146319 1 magnum_manager_heat.go:310] Found nested kube_minions stack: name k8s-lu3bksbwsln3-kube_minions-r4lhlv5xuwu3, ID d0590824-cc70-4da5-b9ff-8581d99c666b
If you redeploy the cluster and keep a older authentication, you'll see "Authentication failed" in the logs of the autoscaler pod, you need to update the secret every time you redeploy the cluster.
Test the autoscaler
Now we need to produce a significant load on the cluster so that the autoscaler is triggered to request Openstack Magnum to create more Virtual Machines.
We can create a deployment of the NGINX container (any other would work for this test):
kubectl create deployment autoscaler-demo --image=nginx
And then create a large number of replicas:
kubectl scale deployment autoscaler-demo --replicas=300
We are using 2 nodes with a large amount of memory and CPU, so they can accommodate more then 200 of those pods. The rest remains in the queue:
kubectl get deployment autoscaler-demo
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
autoscaler-demo 300 300 300 213 18m
And this triggers the autoscaler:
kubectl -n kube-system logs cluster-autoscaler-59f4cf4f4-4k4p2
I0905 05:34:47.401149 1 scale_up.go:689] Scale-up: setting group DefaultNodeGroup size to 2
I0905 05:34:49.267280 1 magnum_nodegroup.go:101] Increasing size by 1, 1->2
I0905 05:35:22.222387 1 magnum_nodegroup.go:67] Waited for cluster UPDATE_IN_PROGRESS status
Check also in the Openstack API:
openstack coe cluster list
+------+------+---------+------------+--------------+--------------------+
| uuid | name | keypair | node_count | master_count | status |
+------+------+---------+------------+--------------+--------------------+
| 09fcf| k8s | comet | 2 | 1 | UPDATE_IN_PROGRESS |
+------+------+---------+------------+--------------+--------------------+
It takes about 4 minutes for a new VM to boot, be configured by Magnum and join the Kubernetes cluster.
Checking the logs again should show another line:
I0912 17:18:28.290987 1 magnum_nodegroup.go:67] Waited for cluster UPDATE_COMPLETE status
Then you should have all 3 nodes available:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-6bawhy45wr5t-master-0 Ready master 38m v1.11.1
k8s-6bawhy45wr5t-minion-0 Ready <none> 38m v1.11.1
k8s-6bawhy45wr5t-minion-1 Ready <none> 30m v1.11.1
and all 300 NGINX containers deployed:
kubectl get deployments
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
autoscaler-demo 300 300 300 300 35m
You can also test scaling down by scaling back the number of NGINX containers to only a few and check in the logs of the autoscaler that this process triggers the scale-down process.
In cluster-autoscaler-deployment-master.yaml
I have configured the scale down process to trigger just after 1 minute, to simplify testing. For production, better increase this to 10 minutes or more. Check the documentation of Cluster Autoscaler 1.14 for all other available options.
Note about the Cluster Autoscaler container
The Magnum provider was added in Cluster Autoscaler 1.15, however this version is not compatible with Kubernetes 1.11 which is currently available on Jetstream. Therefore I have taken the development version of Cluster Autoscaler 1.14 and compiled it myself. I also noticed that the scale down process was not working due to incompatible IDs when the Cloud Provider tried to lookup the ID of a Minion in the Stack. I am now directly using the MachineID instead of going through these indices. This version is available in my fork of autoscaler
and it is built into docker containers on the zonca/k8s-cluster-autoscaler-jetstream
repository on Docker Hub.
The image tags are the short version of the repository git commit hash.
I build the container using the run_gobuilder.sh
and run_build_autoscaler_container.sh
scripts included in the repository.
Note about images used by Magnum
I have tested this deployment using the Fedora-Atomic-27-20180419
image on Jetstream at Indiana University.
The Fedora Atomic 28 image had a long hang-up during boot and took more than 10 minutes to start and that caused timeout in the autoscaler and anyway it would have been too long for a user waiting to start a notebook.
I also tried updating the Fedora Atomic 28 image with sudo atomic host upgrade
and while this fixed the slow startup issue, it generated a broken Kubernetes installation, i.e. the Kubernetes services didn't detect the master node as part of the cluster, kubectl get nodes
only showed the minion.