In my higher ed institution, we are going from no hosted JupyterHub servers to at least three groups looking at them: central IT, School, and cottage industry.
All three are using Kubernetes to scale.
In terms of organisation, containers will be defined for the module (course) presentation level. Modules present once or twice a year with 300-1500 students per presentation.
I was wondering what strategies other people are using to deploy module/course based Jupyter environments to students.
Note that the following includes wide ranging questions that can be answered in general or specific technical terms. I’m just trying to make sense of what possible solutions there are, what folk are trying / have tried / use successfully / tried and will never return to again etc.
To try to categorise different sorts of approach, I can imagine the following sorts of deployment (there could well be others). They are likely to require different amounts of management / resource and reflect different institutional models for providing hosted lab services. There may well be very different costing models or consequences for background running costs etc, for different approaches:
-
JupyterHub created for each presentation of each module; this is probably the smallest atomic level: you only need one Docker image defined and users are limited to students (and staff) registered on a specific presentation of the course; at the end of the presentation, you shut it down and throw everything away;
-
JupyterHub created to cover all presentations of a single module: this might be appropriate if a module organiser or module team want to be able to manage the environment for their students, or someone wants to manage resource (or internal billing!) at the module level; this may require one use Docker image per presentation, with students perhaps being trusted to select the image relevant to their start date; student user accounts may need to be cleared out between presentations;
-
JupyterHub for several modules, perhaps in the same organisational unit (Faculty, School); this might appear if you have a unit based IT team who look after the IT needs within the school. Users are perhaps everyone who has signed up to a module presented by the unit in a particular academic year; images are required for every module or module-presentation; users may be restricted (how?) in terms of the images they can see / launch;
-
One JupyterHub to rule them all, centrally managed; maybe in excess of 10’s of thousands of registered users with accounts that last the lifetime of the student’s enrolment in the institution. Lots of images for lots of modules and/or module-presentations (so how do you limit which images which users can see, if only so as not to overwhelm them in the UI).
In each of the above cases, how do you go about:
- mounting persistent user volumes; for example:
- do users have a volume per module-presentation and have to take their files away at the end of the presentation?
- do users have a personal filestore that is mounted into whatever environment they use?
- does each module-presentation have its own filepath to stash files to try to help manage them (eg
~/{MODULE_CODE}-{PRESENTATION}
.
- enrolling users / managing permissions?
Inside the image, do you always use the same user account name (e.g. the Jupyter default is jovyan
), or another name (user
, student
etc) or maybe you found a way to dynamically set a user account with a parameterised name when the container is launched (if so, how? And how is persistent volume mounting handled?)