Deploy JupyterHub on Kubernetes deployment on Jetstream created with Kubespray 3/3

All of the following assumes you are logged in to the master node of the Kubernetes cluster deployed with kubespray and checked out the repository:

https://github.com/zonca/jupyterhub-deploy-kubernetes-jetstream

Install Jupyterhub

First run

bash create_secrets.sh

to create the secret strings needed by JupyterHub then edit its output secrets.yaml to make sure it is consistent, edit the hosts lines if needed. For example, supply the Jetstream DNS name of the master node js-XXX-YYY.jetstream-cloud.org (XXX and YYY are the last 2 groups of the floating IP of the instance AAA.BBB.XXX.YYY). See part 2, "Publish service externally with ingress".

bash configure_helm_jupyterhub.sh
bash install_jhub.sh

Check some preliminary pods running with:

kubectl get pods -n jhub

Once the proxy is running, even if hub is still in preparation, you can check in browser, you should get "Service Unavailable" which is a good sign that the proxy is working.

Customize JupyterHub

After JupyterHub is deployed and integrated with Cinder for persistent volumes, for any other customizations, first authentication, you are in good hands as the Zero-to-Jupyterhub documentation is great.

The only setup that could be peculiar to the deployment on top of kubespray is setup with HTTPS, see the next section.

Setup HTTPS with letsencrypt

Kubespray instead of installing kube-lego, installs certmanager to handle HTTPS certificates.

First we need to create a Issuer, set your email inside setup_https_kubespray/https_issuer.yml and create it with the usual:

kubectl create -f setup_https_kubespray/https_issuer.yml

Then we can manually create a HTTPS certificate, certmanager can be configured to handle this automatically, but as we only need a domain this is pretty quick, edit setup_https_kubespray/https_certificate.yml and set the domain name of your master node, then create the certificate resource with:

kubectl create -f setup_https_kubespray/https_certificate.yml

Finally we can configure JupyterHub to use this certificate, first edit your secrets.yaml following as an example the file setup_https_kubespray/example_letsencrypt_secrets.yaml, then update your JupyterHub configuration running again:

bash install_jhub.sh

Setup HTTPS with custom certificates

In case you have custom certificates for your domain, first create a secret in the jupyterhub namespace with:

kubectl create secret tls cert-secret --key ssl.key --cert ssl.crt -n jhub

Then setup ingress to use this in secrets.yaml:

ingress:
  enabled: true
  hosts:
    - js-XX-YYY.jetstream-cloud.org
  tls:
  - hosts:
    - js-XX-YYY.jetstream-cloud.org
    secretName: cert-secret

Eventually, you may need to update the certificate. This can be achieved with:

kubectl create secret tls cert-secret --key ssl.key --cert ssl.crt -n jhub \
    --dry-run -o yaml | kubectl apply -f -

Setup custom HTTP headers

After you have deployed JupyterHub, edit ingress:

kubectl edit ingress -n jhub

Add a configuration-snippet line inside annotations:

metadata:
  annotations:
    kubernetes.io/tls-acme: "true"
    nginx.ingress.kubernetes.io/configuration-snippet: |
      more_set_headers "X-Frame-Options: DENY";
      more_set_headers "X-Xss-Protection: 1";

This doesn't require to restart or modify any other resource.

Modify the Kubernetes cluster size

See a followup short tutorial on scaling Kubernetes manually.

Persistence of user data

When a JupyterHub user logs in for the first time, a Kubernetes PersistentVolumeClaim of the size defined in the configuration file is created. This is a Kubernetes resource that defines a request for storage.

kubectl get pvc -n jhub
NAME          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
claim-zonca   Bound    pvc-c469967a-3968-11e9-aaad-fa163e9c7d08   1Gi        RWO            standard       2m34s
hub-db-dir    Bound    pvc-353114a7-3968-11e9-aaad-fa163e9c7d08   1Gi        RWO            standard       6m34s

Inspecting the claims we find out that we have a claim for the user and a claim to store the database of JupyterHub. Currently they are already Bound because they are already satistied.

Those claims are then satisfied by our Openstack Cinder provisioner to create a Openstack volume and wrap it into a Kubernetes PersistentVolume resource:

kubectl get pv -n jhub
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM              STORAGECLASS   REASON   AGE
pvc-353114a7-3968-11e9-aaad-fa163e9c7d08   1Gi        RWO            Delete           Bound    jhub/hub-db-dir    standard                8m52s
pvc-c469967a-3968-11e9-aaad-fa163e9c7d08   1Gi        RWO            Delete           Bound    jhub/claim-zonca   standard                5m4s

This corresponds to Openstack volumes automatically mounted onto the node that is executing the user pod:

+--------------------------------------+-------------------------------------------------------------+-----------+------+----------------------------------------------+
| ID                                   | Name                                                        | Status    | Size | Attached to                                  |
+--------------------------------------+-------------------------------------------------------------+-----------+------+----------------------------------------------+
| e6eddaaa-d40d-4832-addd-a05343ec3a80 | kubernetes-dynamic-pvc-c469967a-3968-11e9-aaad-fa163e9c7d08 | in-use    |    1 | Attached to zonca-k8s-node-nf-1 on /dev/sdc  |
| 00f1e822-8098-4633-804e-46ba44d7de7e | kubernetes-dynamic-pvc-353114a7-3968-11e9-aaad-fa163e9c7d08 | in-use    |    1 | Attached to zonca-k8s-node-nf-1 on /dev/sdb  |

If the user disconnects, the Openstack volume is un-attached from the instance but it is not delete and it is mounted back, optionally on another instance, if the user logs back in.

Delete and reinstall JupyterHub

Helm release deleted:

helm delete --purge jhub

As long as you do not delete the whole namespace, the volumes are not deleted, therefore you can re-deploy the same version or a newer version using helm and the same volume is mounted back for the user

Delete and recreate Openstack instances

When we run terraform to delete all Openstack resources:

bash terraform_destroy.sh

this does not include the Openstack volumes that are created by the Kubernetes persistent volume provisioner.

In case we are interested in keeping the same ip address, run instead:

bash terraform_destroy_keep_floatingip.sh

The problem is that if we recreate Kubernetes again, it doesn't know how to link the Openstack volume to the Persistent Volume of a user. Therefore we need to backup the Persistent Volumes and the Persistent Volume Claims resources before tearing Kubernetes down:

kubectl get pvc -n jhub -o yaml > pvc.yaml
kubectl get pv -n jhub -o yaml > pv.yaml

I recommend always to run kubectl on the local machine instead of the master node, because if you delete the master instance you loose any temporary modification to your scripts. In this case, even more importantly, if you are running on the master node please backup pvc.yaml and pv.yaml locally before running terraform_destroy.sh or they will be wiped out.

Then open the files with a text editor and delete the Persistent Volume and the Persistent Volume Claim related to hub-db-dir.

Edit pv.yaml and set:

  persistentVolumeReclaimPolicy:Retain

Otherwise if you create the PV first, it is deleted because there is no PVC.

Also remove the claimRef section of all the volumes in pv.yaml, otherwise you get the error "two claims are bound to the same volume, this one is bound incorrectly" on the PVC.

Now we can proceed to create the cluster again and then restore the volumes with:

kubectl apply -f pv.yaml
kubectl apply -f pvc.yaml

Feedback

Feedback on this is very welcome, please open an issue on the Github repository or email me at zonca on the domain of the San Diego Supercomputer Center (sdsc.edu).