[ANN] JupyterHub on Hadoop Deployment Guide

jcrist · April 24, 2019, 8:04pm

I just released a new guide for deploying JupyterHub on a Hadoop cluster: https://jcrist.github.io/jupyterhub-on-hadoop/.

The guide is written in the spirit of zero-to-jupyterhub-k8s, but for deploying on Hadoop. It includes instructions on basic installation (currently only manual instructions, help wanted for more pre-packaged ways), as well as common customizations you’d want to do.

A benefit of this setup over running JupyterHub on the edge node is that user’s sessions are also distributed throughout the cluster, allowing resources to scale with usage - no single node is under high load. It also integrates well with the rest of the hadoop ecosystem - you can run Dask and Spark directly (no need for Livy), and have full login access to other resources like HDFS. Conda/virtual environments can be installed on every node, or (more typically) centrally managed as archives on HDFS (using conda-pack/venv-pack).

A walkthrough video is here and docker-compose demo are both available in the docs. Hopefully this guide can prove useful for others. If you have questions or are interested in contributing, feel free to reach out on github: https://github.com/jcrist/jupyterhub-on-hadoop/

Topic		Replies	Views
Connect Jupyterhub to Hadoop discuss jupyterhub , how-to , help-wanted	0	45	October 30, 2024
JupyterHub Spawner on S3 JupyterHub jupyterhub , help-wanted	0	384	July 14, 2021
Recent rollouts of JupyterHub for Kubernetes for 100+ student academic courses? Zero to JupyterHub on Kubernetes	3	103	June 6, 2024
Jupyter Notebook connecting to existing Spark/Yarn Cluster General	7	15718	April 1, 2019
Jupyter Notebook Resource Clustering? General how-to	1	477	September 22, 2023

[ANN] JupyterHub on Hadoop Deployment Guide

Related topics