"When to use JupyterHub?"

Continuing the discussion from Is running JupyterHub as root a requirement for deployment?:

This is a question that has come up in many contexts many times. I think it would be useful to have a page laying out ‘when to use JupyterHub’.

What do people think? What kinda stuff should be in this page?

1 Like

I keep wondering if a cartoony flow chart might be useful?

Cartoony style would mean you could use informal language, express doubt on edges out of decision boxes, not be too serious / intimidating but still be useful?

1 Like

My perspective on this is: JupyterHub is for when you have or have access to computational resources that you want to make available to some users via Jupyter UI. It enables things like taking away the burden of package installation, etc. for courses, researchers, etc.

the short answer: “I have computers and users and I want to make it easy for those users to login and use these computers”

My perspective on the main reasons to use JupyterHub vs rolling your own, from that starting point:

  • JupyterHub implements login pages, authentication
  • JupyterHub tracks activity for shutting down unused resources, etc.
  • JupyterHub handles proxying lots of Jupyter servers under one URL
  • lots of folks use JupyterHub already, so there’s a decent chance someone has seen the same problems you are facing

The main reason to roll your own, to me, is if you already have existing solutions to some or all of these and JupyterHub’s implementation only gets in the way. I’d use KubeFlow as an example - they have their own systems for deploying resources for users and authenticating access, and are experts in deploying services on Kubernetes, so JupyterHub’s abstractions have ended up more in the way than helpful. I think KubeFlow is better off rolling their own notebook deployment implementation than using JupyterHub. Tmpnb is borderline: tmpnb itself is much simpler and much faster than an equivalent JupyterHub configuration.

5 Likes

Thanks @minrk!

A few follow-up questions:

It enables things like taking away the burden of package installation

Can users still override this trivially? Some users may want to install their own extensions, etc.

JupyterHub handles proxying lots of Jupyter servers under one URL

Can one user access another user’s notebooks trivially?

Re: package installation, via @yuvipanda here, there are TLJH plugins, for which the docs include a pattern that shows how to create a plugin to install additional conda packages.

What I’m asking specifically is if the user can customize their JupyterLab instance by persisting settings, editing the config, and installing labextensions. I assume that the first two are possible (as they can be stored in jupyter’s user directory) and the third is probably not (because the lab application exists outside the user directory).

Great question.
There are so many alternatives. Perhaps we should build a matrix of the possibilities and the pros and cons. Server side and client side. We have a spectrum of options. Below are some somewhat unrefined ideas based upon experience on trying to create or provide services.

Candidates

  • tljh
  • jh on various platforms
  • binder - pretty awesome
  • personal jupyter lab/notebook
    • local
    • vm
    • virtualenv

Barrier to entry

  • server side
    • security
    • resource management
    • education, marketing
  • client side
    • education - how to set up personal server securely
    • configuration
    • virtualenv on shared resources
    • anaconda vs crafted

Other dimensions

  • gpu
  • personal
  • private cloud
  • public cloud
3 Likes

Yes, generally. Most deployments are made in such a way that things like pip install --user can add packages specified by the user. conda install also typically works in a container-based deployment (docker, kubernetes), but users don’t usually have permissions to install packages in a shared environment in a case like TLJH. This is all about permissions and how the user environment is specified. JupyterHub doesn’t do anything to prevent or enable users installing packages, all the standard mechanisms for users to install packages work (or don’t).

This depends on storage/permissions. When using a default shared filesystem like TLJH, then this works the same as it would for any shared file on the filesystem - a shared directory and directory/file permissions govern who can read/write files.

For a container-based deployment, this can be more complicated.

Yes, these should all be possible in ~all JupyterHub deployments. labextensions are the only thing that might be an issue, if jupyter lab build must go in $PREFIX instead of a user directory, but again this can be governed by permissions. If that’s the case, there should be an issue in jupyterlab to fix it. user-installed extensions should absolutely not need write permissions to sys.prefix. In a container-based deployment, users typically (but it is not a requirement) have permission to modify the env, so this works.

1 Like