Installation: JupyterHub
Requirements
- A server running Rocky Linux
- Knowledge of the command-line and text editors
- Basic knowledge about installing and running web services and python applications.
Introduction
JupyterHub brings the power of notebooks to groups of users. It gives users access to computational environments and resources without burdening the users with installation and maintenance tasks. Users - including students, researchers, and data scientists - can get their work done in their own workspaces on shared resources which can be managed efficiently by system administrators. JupyterHub runs in the cloud or on your own hardware, and makes it possible to serve a pre-configured data science environment to any user in the world. It is customizable and scalable, and is suitable for small and large teams, academic courses, and large-scale infrastructure.1 2
This guide will demonstrate the requirements and steps needed to install Jupyterhub on Rocky Linux OS. Based on this firm's need there a multiple configuration steps included in this document. Multiple external sources have been used to develop this guide. It should be noted that not all steps to the networking process are outlined here. Topics such as port-forwarding, DNS, or DHCP, will not be discussed in this document.
Installation
Create a virtual environment under '/opt/jupyterhub'.
sudo python3 -m venv /opt/jupyterhub/
Install the following packages into the newly created virtual environment.
sudo /opt/jupyterhub/bin/python3 -m pip install wheelsudo /opt/jupyterhub/bin/python3 -m pip install jupyterhub jupyterlabsudo /opt/jupyterhub/bin/python3 -m pip install ipywidgets
Install node
and npm
which are requirements for configurable-http-proxy
.
sudo dnf install nodejs npm
Install configurable-http-proxy
sudo npm install -g configurable-http-proxy
Basic Configuration
Create a folder location for the JupyterHub configuration
sudo mkdir -p /opt/jupyterhub/etc/jupyterhub/cd /opt/jupyterhub/etc/jupyterhub/
Generate the default configuration file.
sudo /opt/jupyterhub/bin/jupyterhub --generate-config
This will produce the default configuration file /opt/jupyterhub/etc/jupyterhub/jupyterhub_config.py
Set the following configuration option in your jupyterhub_config.py file:
c.Spawner.default_url = '/lab'
JupyterHub Service
Setup JupyterHub to run as a system service using Systemd.
sudo mkdir -p /opt/jupyterhub/etc/system/jupterhub.service
Paste the following service unit defninition in the file:
[Unit]Description=JupyterHubAfter=syslog.target network.target[Service]User=rootEnvironment="PATH=/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/opt/jupyterhub/bin"ExecStart=/opt/jupyterhub/bin/jupyterhub -f /opt/jupyterhub/etc/jupyterhub/jupyterhub_config.py[Install]WantedBy=multi-user.target
Make systemd aware of the service file. Symlink the file into systemd's directory:
sudo ln -s /opt/jupyterhub/etc/systemd/jupyterhub.service /etc/systemd/system/jupyterhub.service
Reload the systemd configuration files.
sudo systemctl daemon-reload
Enable the jupyterhub service.
sudo systemctl enable jupyterhub
Start the jupyterhub service.
sudo systemctl start jupyterhub
Check the status of the jupyterhub service.
sudo systemctl status jupyterhub
User Creation
Create system users who can log into the JupyterHub spawner.
sudo adduser <user>sudo passwd <user>
To enable administrator privileges for the user, add the user to the wheel group.
sudo usermod -aG wheel <user>
Modify the JupyterHub configuration to allow the user to log into the application. For each uncommented line be sure to remove leading spaces and indents
For regular users:
c.Authenticator.allowed_users = {'<user>'}
For admin users:
c.JupyterHub.admin_access = True
c.Authenticator.admin_users = {'<user>'}
Shared Directory
To allow users to share the files on the platform. Create a shared folder in the '/opt/jupyterhub' directory
sudo mkdir /opt/jupyterhub/vtshare
In each of the user's shared directories, create a symbolic link to the shared folder
sudo ln -s /opt/jupyterhub/share /home/<user>/share
Reverse Proxy
To add access JupyterHub through a reverse proxy. Add the below code to the HTTP site configuration file, and tailor it to fit your needs.
Rewrite Rule:
RewriteRule /jupyterhub/(.*) ws://XXX.XXX.XXX.XXX:8888/jupyterhub/$1 [P,L]RewriteRule /jupyterhub/(.*) http://XXX.XXX.XXX.XXX:8888/jupyterhub/$1 [P,L]
Reverse Proxy:
<Location "/jupyterhub/"> # preserve Host header to avoid cross-origin problems
ProxyPreserveHost on # proxy to JupyterHub
ProxyPass http://XXX.XXX.XXX.XXX:8888/jupyterhub/
ProxyPassReverse http://XXX.XXX.XXX.XXX:8888/jupyterhub/
</Location>
JupyterHub Configuration:
c.JupyterHub.bind_url = 'http://XXX.XXX.XXX.XXX:8888/jupyterhub/'
Restart the jupyterhub service.
sudo systemctl restart jupyterhub
Firewall Settings
Add JupyterHub to the firewall.
firewall-cmd --permanent --add-port=8888/tcp
Reload the firewall.
firewall-cmd --reload
JupyterLab
JupyterLab is the latest web-based interactive development environment for notebooks, code, and data. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing, computational journalism, and machine learning. A modular design invites extensions to expand and enrich functionality.3
Along with the implementation of JupyterLab, comes the usability of extensions. Several extensions which users may find useful include: drawio, jupyterlab-sparkmonitor, jupyterlab-hide-code, @jupyterlab/celltags, @jupyterlab/git, and @jupyterlab/latex. Be aware not all extensions are compatible with certain versions of JupyterHub or JupyterLab
One piece of useful code to center align charts is below. It can be placed in the first code cell, run at the beginning of each kernel, and hidden with the tools from the jupyterlab-hide-code extension.
from IPython.core.display import HTMLHTML("""<style>img{ margin: auto; display: block;}</style>""")
If you have considered setting up a PySpark cluster, use the below code to connect and disconnect to the cluster.
import findsparkfindspark.init()
import pysparkfrom pyspark.sql
import SparkSession
sc = pyspark.SparkContext(master='spark://XXX.XXX.XXX.XXX:7077', appName='test')sc.stop()
Conclusion
JupyterHub can be used to serve a variety of environments. It supports dozens of kernels with the Jupyter server, and can be used to serve a variety of user interfaces including the Jupyter Notebook, Jupyter Lab, RStudio, nteract, and more. The application can be configured with authentication in order to provide access to a subset of users. Authentication is pluggable, supporting a number of authentication protocols (such as OAuth and GitHub).
JupyterHub is container-friendly, and can be deployed with modern-day container technology. It also runs on Kubernetes, and can run with up to tens of thousands of users. JupyterHub is entirely open-source and designed to be run on a variety of infrastructure. This includes commercial cloud providers, virtual machines, or even your own laptop hardware. 4