This tutorial describes the steps to install a Jupyterhub instance on a single machine suitable for hosting a workshop, suitable for having people login with training accounts on Jupyter Notebooks running Python 2/3, R, Julia with also Terminal access on Docker containers. Details about the setup:

  • Jupyterhub installed with Anaconda directly on the host, proxied by NGINX under HTTPS with self-signed certificate
  • Login with Linux account credentials created previously by the administrator, data in /home are persistent across sessions
  • Each user runs in a separated Docker container with access to Python 2, Python 3, R and Julia kernels, they can also open the Notebook editor and the terminal
  • Using a single machine you have to consider that the biggest constraint is going to be memory usage, as a rule of thumb consider 100-200 MB/user plus 5x-10x the amount of data you are loading from disk, depending on the kind of analysis. For a multi-node setup you need to look into Docker Swarm.

I am using the OpenStack deployment at the San Diego Supercomputer Center, SDSC Cloud, AWS deployments should just replace the first section on Creating a VM and setting up Networking, see the Jupyterhub wiki.

If you intend to run on SDSC Cloud, I have a pre-built image of this deployment you can setup and run quickly, see see my followup tutorial.

Create a Virtual Machine in OpenStack

First of all we need to launch a new Virtual Machine and configure the network.

  • Login to the SDSC Cloud OpenStack dashboard

Network setup

Jupyterhub will be proxied to the standard HTTPS port by NGINX and we also want to redirect HTTP to HTTPS, so we open those ports, then SSH for the administrators to login and a custom TCP rule in order for the Docker containers to be able to connect to the Jupyterhub hub running on port 8081, so we are opening that port just to the subnet that is running the Docker containers.

  • Compute -> Access & Security -> Security Groups -> Create Security Group and name it jupyterhubsecgroup
  • Click on Manage Rules
  • Click on add rule, choose the HTTP rule and click add
  • Repeat the last step with HTTPS and SSH
  • Click on add rule again, choose Custom TCP Rule, set port 8081 and set CIDR 172.17.0.0/24 (this is needed so that the containers can connect to the hub)

Create a new Virtual Machine

We choose Ubuntu here, also other distributions should work fine.

  • Compute -> Access & Security -> Key Pairs -> Create key pair, name it jupyterhub and download it to your local machine
  • Instances -> Launch Instance, Choose a name, Choose "Boot from image" in Boot Source and Ubuntu as Image name, Choose any size, depending on the number of users (TODO add link to Jupyterhub docs)
  • Under "Access & Security" choose Key Pair jupyterhub and Security Groups jupyterhubsecgroup
  • Click Launch to create the instance

Give public IP to the instance

By default in SDSC Cloud machines do not have a public IP.

  • Compute -> Access & Sewcurity -> Floating IPs -> Allocate IP To Project, "Allocate IP" to request a public IP
  • Click on the "Associate" button of the IP just requested and under "Port to be associated" choose the instance just created

Setup Jupyterhub in the Virtual Machine

In this section we will install and configure Jupyterhub and NGINX to run on the Virtual Machine.

  • login into the Virtual Machine with ssh -i jupyterhub.pem ubuntu@xxx.xxx.xxx.xxx using the key file and the public IP setup in the previous steps
  • add the hostname of the machine (check by running hostname) to /etc/hosts, i.e. the first line should become something like 127.0.0.1 localhost jupyterhub if jupyterhub is the hostname

Setup Jupyterhub

 wget --no-check-certificate https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
 bash Miniconda3-latest-Linux-x86_64.sh
 ```

 use all defaults, answer "yes" to modify PATH

 ```
sudo apt-get install npm nodejs-legacy
sudo npm install -g configurable-http-proxy
conda install traitlets tornado jinja2 sqlalchemy 
pip install jupyterhub

For authentication to work, the ubuntu user needs to be able to read the /etc/shadow file:

sudo adduser ubuntu shadow

Setup the web server

We will use the NGINX web server to proxy Jupyterhub and handle HTTPS for us, this is recommended for deployments on the public internet.

sudo apt install nginx

SSL Certificate: Optionally later, once we have assigned a domain to the Virtual Machine, we can install letsencrypt and get a real certificate, see my followup tutorial, for simplicity here we are just using self-signed certificates that will give warnings on the first time users connect to the server, but still will keep the traffic encrypted.

sudo mkdir /etc/nginx/ssl
sudo openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout /etc/nginx/ssl/nginx.key -out /etc/nginx/ssl/nginx.crt

Get /etc/nginx/nginx.conf from https://gist.github.com/zonca/08c413a37401bdc9d2a7f65a7af44462

Setup Docker Spawner

By default Jupyterhub runs notebooks as processes owned by each system user, for more security and isolation, we want Notebook to run in Docker containers, which are something like lightweight Virtual Machines running inside our server.

Install Docker

  • Source: https://docs.docker.com/engine/installation/linux/ubuntulinux/#prerequisites
sudo apt update
sudo apt install apt-transport-https ca-certificates
sudo apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
echo "deb https://apt.dockerproject.org/repo ubuntu-trusty main" | sudo tee /etc/apt/sources.list.d/docker.list 
sudo apt update
sudo apt install docker-engine
sudo usermod -aG docker ubuntu

Logout and login again for the group to take effect

Install and configure DockerSpawner

pip install dockerspawner
docker pull jupyter/systemuser
conda install ipython jupyter

Create jupyterhub_config.py in the home folder of the ubuntu user with this content:

c.JupyterHub.confirm_no_ssl = True
c.JupyterHub.spawner_class = 'dockerspawner.SystemUserSpawner'

# The docker instances need access to the Hub, so the default loopback port doesn't work:
from IPython.utils.localinterfaces import public_ips
c.JupyterHub.hub_ip = public_ips()[0]

Connect to Jupyterhub

From the home folder of the ubuntu user, type jupyterhub to launch the Jupyterhub process, see below how to start it automatically at boot. Use CTRL-C to stop it.

Open a browser and connect to the floating IP you set for your instance, this should redirect to the https, click "Advanced" in the warning about safety due to the self signed SSL certificate and login with the training credentials.

Instead of using the IP, you can use any domain that points to that same IP with a DNS record of type A or get a dymanic DNS for free on a website like http://noip.com. Once you have a custom domain, you can configure letsencrypt to have a proper HTTPS certificate so that users do not get any warning when connecting to the instance. I will add this to the optional steps below.

Optional: Automatically start jupyterhub at boot

Save https://gist.github.com/zonca/aaeaf3c4e7339127b482d759866e5f39 as /etc/init.d/jupyterhub

sudo chmod +x /etc/init.d/jupyterhub
sudo service jupyterhub start
sudo update-rc.d jupyterhub defaults

Optional: Create training user accounts

Add user accounts on Jupyterhub creating standard Linux users with adduser interactively or with a batch script.

For example the following batch script creates 10 users all with the same password:

1
2
3
4
5
6
7
8
#!/bin/bash
PASSWORD=samepasswordforallusers
NUMBER_OF_USERS=10
for n in `seq -f "%02g" 1 $NUMBER_OF_USERS`
do
    echo creating user training$n
    echo training$n:$PASSWORD::::/home/training$n:/bin/bash | sudo newusers
done

Also add AllowUsers ubuntu to /etc/ssh/sshd_config so that training users cannot SSH into the host machine.

Optional: Add the R and Julia kernels

  • SSH into the instance
  • git clone https://github.com/jupyter/dockerspawner
  • cd dockerspawner

Modify the file singleuser/Dockerfile, replace FROM jupyter/scipy-notebook with FROM jupyter/datascience-notebook

docker build -t datascience-singleuser singleuser

Modify the file systemuser/Dockerfile, replace FROM jupyter/singleuser with FROM datascience-singleuser

docker build -t datascience-systemuser systemuser

Finally in jupyterhub_config.py, select the new docker image:

c.DockerSpawner.container_image = "datascience-systemuser"