Installation: JupyterHub

Requirements

  • A server running Rocky Linux
  • Knowledge of the command-line and text editors
  • Basic knowledge about installing and running web services and python applications.

Introduction

JupyterHub brings the power of notebooks to groups of users. It gives users access to computational environments and resources without burdening the users with installation and maintenance tasks. Users - including students, researchers, and data scientists - can get their work done in their own workspaces on shared resources which can be managed efficiently by system administrators. JupyterHub runs in the cloud or on your own hardware, and makes it possible to serve a pre-configured data science environment to any user in the world. It is customizable and scalable, and is suitable for small and large teams, academic courses, and large-scale infrastructure.1 2

This guide will demonstrate the requirements and steps needed to install Jupyterhub on Rocky Linux OS. Based on this firm's need there a multiple configuration steps included in this document. Multiple external sources have been used to develop this guide. It should be noted that not all steps to the networking process are outlined here. Topics such as port-forwarding, DNS, or DHCP, will not be discussed in this document.

Installation

Create a virtual environment under '/opt/jupyterhub'.

sudo python3 -m venv /opt/jupyterhub/

Install the following packages into the newly created virtual environment.

sudo /opt/jupyterhub/bin/python3 -m pip install wheelsudo /opt/jupyterhub/bin/python3 -m pip install jupyterhub jupyterlabsudo /opt/jupyterhub/bin/python3 -m pip install ipywidgets

Install node and npm which are requirements for configurable-http-proxy.

sudo dnf install nodejs npm

Install configurable-http-proxy

sudo npm install -g configurable-http-proxy

Basic Configuration

Create a folder location for the JupyterHub configuration

sudo mkdir -p /opt/jupyterhub/etc/jupyterhub/cd /opt/jupyterhub/etc/jupyterhub/

Generate the default configuration file.

sudo /opt/jupyterhub/bin/jupyterhub --generate-config

This will produce the default configuration file /opt/jupyterhub/etc/jupyterhub/jupyterhub_config.py

Set the following configuration option in your jupyterhub_config.py file:

c.Spawner.default_url = '/lab'

JupyterHub Service

Setup JupyterHub to run as a system service using Systemd.

sudo mkdir -p /opt/jupyterhub/etc/system/jupterhub.service

Paste the following service unit defninition in the file:

[Unit]Description=JupyterHubAfter=syslog.target network.target[Service]User=rootEnvironment="PATH=/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/opt/jupyterhub/bin"ExecStart=/opt/jupyterhub/bin/jupyterhub -f /opt/jupyterhub/etc/jupyterhub/jupyterhub_config.py[Install]WantedBy=multi-user.target

Make systemd aware of the service file. Symlink the file into systemd's directory:

sudo ln -s /opt/jupyterhub/etc/systemd/jupyterhub.service /etc/systemd/system/jupyterhub.service

Reload the systemd configuration files.

sudo systemctl daemon-reload

Enable the jupyterhub service.

sudo systemctl enable jupyterhub

Start the jupyterhub service.

sudo systemctl start jupyterhub

Check the status of the jupyterhub service.

sudo systemctl status jupyterhub

User Creation

Create system users who can log into the JupyterHub spawner.

sudo adduser <user>sudo passwd <user>

To enable administrator privileges for the user, add the user to the wheel group.

sudo usermod -aG wheel <user>

Modify the JupyterHub configuration to allow the user to log into the application. For each uncommented line be sure to remove leading spaces and indents

For regular users:

c.Authenticator.allowed_users = {'<user>'}

For admin users:

c.JupyterHub.admin_access = True

c.Authenticator.admin_users = {'<user>'}

Shared Directory

To allow users to share the files on the platform. Create a shared folder in the '/opt/jupyterhub' directory

sudo mkdir /opt/jupyterhub/vtshare

In each of the user's shared directories, create a symbolic link to the shared folder

sudo ln -s /opt/jupyterhub/share /home/<user>/share

Reverse Proxy

To add access JupyterHub through a reverse proxy. Add the below code to the HTTP site configuration file, and tailor it to fit your needs.

Rewrite Rule:

RewriteRule /jupyterhub/(.*) ws://XXX.XXX.XXX.XXX:8888/jupyterhub/$1 [P,L]RewriteRule /jupyterhub/(.*) http://XXX.XXX.XXX.XXX:8888/jupyterhub/$1 [P,L]

Reverse Proxy:

<Location "/jupyterhub/">    # preserve Host header to avoid cross-origin problems
	ProxyPreserveHost on    # proxy to JupyterHub    
	ProxyPass         http://XXX.XXX.XXX.XXX:8888/jupyterhub/    
	ProxyPassReverse  http://XXX.XXX.XXX.XXX:8888/jupyterhub/
</Location>

JupyterHub Configuration:

c.JupyterHub.bind_url = 'http://XXX.XXX.XXX.XXX:8888/jupyterhub/'

Restart the jupyterhub service.

sudo systemctl restart jupyterhub

Firewall Settings

Add JupyterHub to the firewall.

firewall-cmd --permanent --add-port=8888/tcp

Reload the firewall.

firewall-cmd --reload

JupyterLab

JupyterLab is the latest web-based interactive development environment for notebooks, code, and data. Its flexible interface allows users to configure and arrange workflows in data science, scientific computing, computational journalism, and machine learning. A modular design invites extensions to expand and enrich functionality.3

Along with the implementation of JupyterLab, comes the usability of extensions. Several extensions which users may find useful include: drawio, jupyterlab-sparkmonitor, jupyterlab-hide-code, @jupyterlab/celltags, @jupyterlab/git, and @jupyterlab/latex. Be aware not all extensions are compatible with certain versions of JupyterHub or JupyterLab

One piece of useful code to center align charts is below. It can be placed in the first code cell, run at the beginning of each kernel, and hidden with the tools from the jupyterlab-hide-code extension.

from IPython.core.display import HTMLHTML("""<style>img{    margin: auto;    display: block;}</style>""")

If you have considered setting up a PySpark cluster, use the below code to connect and disconnect to the cluster.

import findsparkfindspark.init() 
import pysparkfrom pyspark.sql 
import SparkSession
sc = pyspark.SparkContext(master='spark://XXX.XXX.XXX.XXX:7077', appName='test')sc.stop()

Conclusion

JupyterHub can be used to serve a variety of environments. It supports dozens of kernels with the Jupyter server, and can be used to serve a variety of user interfaces including the Jupyter Notebook, Jupyter Lab, RStudio, nteract, and more. The application can be configured with authentication in order to provide access to a subset of users. Authentication is pluggable, supporting a number of authentication protocols (such as OAuth and GitHub).

JupyterHub is container-friendly, and can be deployed with modern-day container technology. It also runs on Kubernetes, and can run with up to tens of thousands of users. JupyterHub is entirely open-source and designed to be run on a variety of infrastructure. This includes commercial cloud providers, virtual machines, or even your own laptop hardware. 4


  1. Jupyter
  2. JupyterHub Installation Guide
  3. JupyterLab
  4. What is JupyterHub?