Kubernetes monitoring with Prometheus and Grafana

In a production Kubernetes deployment it is necessary to make it easier to monitor the status of the cluster effectively. Kubernetes provides Prometheus to gather data from the different components of Kubernetes and Grafana to access those data and provide real-time plotting and inspection capability. Moreover, they both provide systems to send alerts in case some conditions on the state of the cluster are met, i.e. using more than 90% of RAM or CPU.

The only downside is that the pods that handle monitoring consume some resource themselves, so this could be significant for small clusters below 5 nodes or so, but shouldn't be a problem for typical larger production deployments.

Both Prometheus and Grafana can be installed separately with Helm recipes or using the Prometheus operator Helm recipe, however those deployments do not have any preconfigured dashboards, it is easier to get started thanks to the kube-prometheus project, which not only installs Prometheus and Grafana, but also preconfigures about 10 different Grafana dashboards to explore in depth the status of a Kubernetes cluster.

The main issue is that customizing it is really complicated, it requires modifying jsonnet templates and recompiling them with a jsonnet builder which requires go, however I don't foresee the need to do that for most users.

Unfortunately it is not based on Helm, so you need to first checkout the repository:

git clone https://github.com/coreos/kube-prometheus

and then follow the instructions in the documentation, copied here for convenience:

kubectl create -f manifests/

wait a moment, do not worry if some of the tasks fails, they should get fixed running:

kubectl apply -f manifests/

This creates several pods in the monitoring namespace:

kubectl get pods -n monitoring
NAME                                   READY   STATUS    RESTARTS   AGE
alertmanager-main-0                    2/2     Running   0          13m
alertmanager-main-1                    2/2     Running   0          13m
alertmanager-main-2                    2/2     Running   0          13m
grafana-9d97dfdc7-zkfft                1/1     Running   0          14m
kube-state-metrics-7c7979b6bc-srcvk    4/4     Running   0          12m
node-exporter-b6n2w                    2/2     Running   0          14m
node-exporter-cgp46                    2/2     Running   0          14m
prometheus-adapter-b7d894c9c-z2ph7     1/1     Running   0          14m
prometheus-k8s-0                       3/3     Running   1          13m
prometheus-k8s-1                       3/3     Running   1          13m
prometheus-operator-65c44fb7b7-8ltzs   1/1     Running   0          14m

Then you can setup forwarding on your laptop to export grafana locally:

kubectl --namespace monitoring port-forward svc/grafana 3000

Access localhost:3000 with your browser and you should be able to navigate through all the statistics of your cluster, see for example this screenshot. The credentials are user admin and password admin.

Screenshot of the Grafana UI

Access the UI from a different machine

In case you are running the configuration on a remote server and you would like to access the Grafana UI (or any other service) from your laptop, you can install kubectl also your my laptop, then copy the .kube/config to the laptop with:

 scp -r KUBECTLMACHINE:~/.kube/config ~/.kube

and run:

 ssh ubuntu@$IP -f -L 6443:localhost:6443 sleep 3h &

from the laptop and then run the port-forward command locally on the laptop.

Monitor JupyterHub

Once we have deployed JupyterHub with Helm, we can pull up the "namespace" monitor and select the jhub namespace to visualize resource usage but also usage requests and limits of all pods created by JupyterHub and its users. See a screenshot below.

Screenshot of the Grafana namespace UI

Setup alerts

Grafana supports email alerts, but it needs a SMTP server, and it is not easy to setup and to avoid being filtered as spam. The easiest way is to setup an alert to Slack, and optionally be notified via email of Slack messages.

Follow the instructions for slack on the Grafana documentation

Create a Slack app, name it e.g. Grafana
Add feature "Incoming webhook"
Create a incoming webhook in the workspace and channel your prefer on Slack
In the Grafana Alerting menu, set the webhook incoming url, the channel name

Screenshot of the Grafana slack notification