I’d love to hear people’s thoughts, experiences, and frustrations with deploying JupyterHub in an educational context. What is working, what is missing, what is annoying?
Here’s my (very positive) experience deploying JupyterHub using Docker (Compose) on a server at University of Versailles: https://opendreamkit.org/2018/10/17/jupyterhub-docker/.
I’ve been using JupyterHub for 3 semesters in a class of about 60 students. This is the first time things have been really rock solid in the user experience and no longer feel the “bleeding” of bleeding edge. I’m running on Google Cloud Platform with Github Authentication using the Zero to Jupyterhub project (Kubernetes). The only issue has been that the cluster doesn’t automatically downsize. I got a suggestion for a fix but haven’t tried it.
Have also been working on a project called Carme (a moon of Jupyter) with some students from our center for open source that (among other things) makes the process of managing infrastructure easier. Here are some tutorials for deploying Jupyterhub on the Google Cloud and Azure. We would love others to try it and provide feedback.
Great post @defeo. Thanks for posting it here
@choldgraf Does “Educational Context” include throwing together a TLJH for interactive demos, and tutorials for professional developers?
IMO yes, I think so!
We have used a cluster for a level 1 stem course, where out of a cohort of over 3,000 students we had 350 use the cluster over a period of a couple of months, for a set optional activities for the course. This was using Azure ACS, since AKS was still in preview at the time. But the platform was rock solid, with no support issues. We integrated the hub with our moodle installation using the LTi authenticator. We are now looking at how we can use JuptyerHub more widely across our curriculum.
Hey all! I’m at the university of COlorado, Boulder in Earth Lab. I direct the earth analytics education program there. I’ve found the jupyter hub to be a great platform to avoid the setup frustrations and the computing limitations that my students often have. With that say many students do still want a local setup. So we use both a local conda environment with all of the packages that conflict with each other unless installed in the proper order AND the hub. We are using Jupyter hub for training and also for our earth analytics professional program which consists of a suite of courses. we have 20-30 students now but do plan to scale. We’ve built a super cool multi-hub setup where you can customize a new hub to meet your program demands working with Tim Head. I can launch a new hub in an hour or two (the time associated with docker containers building) and i can update a hub very easily. We have lots of data needs with our hub. And while the data we are using are small we do intend to use larger and larger data in the future.
My largest frustration with the hub now is DATA. It’s really cumbersome for a student to upload data to the hub.
- it often fails (it could be an internet connection issue - i am not sure but it’s related to…
- There is no indicator when you upload data of progress so it’s hard to say when it fails.
You might think - well why don’t you have them download the data?! this does work but the reality of larger datasets - particularly in the earth sciences is you often have to order it and wait for it to process. It doesn’t stay on a server infinitely so you need to get it and then get it into the hub. so a download process is not reproducible if you have to run it 2 weeks later to grade. We need a good way of adding “stuff” to the hub that is not on a server somewhere be it data or files, etc. we aren’t just teaching data science with the hub - we are teaching students how to work with lots of different types of (large) data!
Ok wish #2. i am so greedy grading. i really would love a really robust way to grade notebooks. nothing out there quite does what i want yet.
and wish #3 i’d like a way for students to generate reports from jupyter. This means an easy way to hide code and outputs as they wish to create a beautiful html report from a notebook where data, inputs, processing and outputs are combined in a fully reproducible, open workflow i know bad leah - this is jupyter not necessarily the hub specific but a girl can still dream.
I love jupyter btw, this is just my experience working with it in my classes over the past year.
@lawasser could you go into a bit more detail on what you’d like for a “report” in jupyterland? Does it need to be interactive? A PDF? Would students create it with a command line call? Or a UI?
I threw together a lightweight jupyter server plugin that simply lets you download the notebook as a single HTML file w/ embedded images after doing a little bit of cleanup. Would something like this workflow below suit you @lawasser?
We (this was done on time paid for by Leah) fixed this with https://github.com/jupyter/notebook/pull/4221. If someone has better JS/HTML skills diving into Min’s suggestion at the end of the PR would be great.
Sync’ing of directories between your JupyterHub storage space and your laptop with a user experience a bit like Dropbox’s is still on my wish list. In particular if the hub is deployed on kubernetes. https://github.com/betatim/binder-syncthing kinda works but gives an error message related to trying to upgrade the version of it (I think).
Does anyone know if other cloud storage providers have a good not-X11 client?
@choldgraf Yes… i’m running into so so so many issues with jupyter reports now. Let me tell you what i want. And i know it can happen because ive seen the beginning of this report via jupyter which R does so nicely with knitr
Chris have a look at this package:
Please note that currently while in theory this is a wonderful package in application it is very very buggy. (i have an issue open and have found more issues working with it via my students).
It does almost what i want. What i want is this (nbclean can so do this). And a template can be used to do what i also want which is an image caption.
Ok i want
- The ability to click on a cell and hide the code, hide the outputs, hide the little number next to the code.just like the hide_code module does but again it doesn’t do it well - lots of bugs but i love the interface so far. it’s easy and intuitive for me and students.
- I want to be able to then EXPORT that notebook as HTML or pdf with cell numbers, code, outputs etc hidden - just like the hide_code module could do but again it fails often in implementation across machines.
- I want the ability to add an image CAPTION to an output. I would want just one caption per cell. And i could see a “caption” box in the hide code toolbar that allowed you to type in caption text. That text would then go below the image and would be formatted slightly differently just to stand out a bit as a caption.
R knitr workflow does this. If this could be done easily then students could write reports AND papers AND blogs with jupyter easily. and they’d be so so happy and it would be a truly reproducible workflow with a report output connecting data, inputs and outputs. yay open science.
@choldgraf i’m so happy to test things if you are working on them and please also look at the hide_code module as a nice example of where this could go with an interface. What i see in yoru graphic above is a really great start towards building reports. but we want to customize the cells and hide code too and gosh captions would be lovely.
Thank you so much for pinging me on this! i’m really enjoying jupyter more and more the community is wonderful!
It makes it really hard to keep track of what you have read/haven’t read if people start editing their posts for anything but typos. At least for me edited posts don’t show up again as unread. It also means that discussion that followed the original post might suddenly not make any sense any more.
Could we agree to not edit posts except for typos if they are more than a few minutes old/have been read by someone else?
Regarding #3, have you looked at RISE to present final results? Some of the things you mentioned (hiding) are in there.
Captions: one somewhat ugly (code-wise) way is this:
from IPython.display import HTML HTML('<h1>This is a heading</h1>')
My 2 Cents:
- Must be able to manage whether students can upload or download data. There are instances where environments should not contain sensitive data or for control reasons (compute/storage capacity).
- Must be able to manage the user’s ability to run terminal sessions and or somehow root jail their processes.
I think that Jupyter Hub/Lab are really great, but their openness also makes them very tricky to manage when running on a shared platform.
Having just played with tweaking a binderhub - not kubernetes - I really kind of wonder what its limitations are. One potential advantage is that you can start with a notebook and data. The fact that the storage is transitory seems like a plus from a storage management standpoint.
It also works with multiple notebooks. The user can always decide to download his/her notebook(s), data, modules, then store them in a github, or github enterprise, repo and “binderize” the repo.
In addition, executing a
!pip install --user module works. So whatever additional resources are needed can be added to a
I am actually starting to wonder if I could not just replace
the-littlest-jupyterhub with “the littlest binderhub”.
It seems like this approach also solves the “authentication” problem.
What might I be missing in this analysis from a small scale education perspective?
What do you mean with “tweaking a binderhub - not kubernetes”? How did you run BinderHub without kubernetes?
Two things I’d improve before using a BinderHub for courses/education:
- a better (read: any kind of story) story for pushing work to GitHub/some permanent storage
- an extension/notebook server tweak that lets users download a notebook they have open in their browser window even after the Binder session has expired (there is an issue somewhere but I can’t find it. Tl;DR: all the information we need is in the browser but the “save as” button tries to talk to the server, that somehow needs re-routing or an additional button)
Otherwise I think a BinderHub is great for courses because it lets each instructor/course have exactly the environment they want, fully self-service, etc. You can do cool stuff like https://course.spacy.io/. Maybe if you need grading it gets a bit trickier?
This story that I have played with for persistence seems to work pretty well. After working in binder, if you want to create a derivative version:
- Create repo with GitHub or GitHub Enterprise UI.
- In Jupyter, download a copy of everything, including the notebook, possibly modified.
- In GH/GHE, Upload files with Explorer/Finder (Windows/Mac)
- Modify the binder badge in the readme to reflect the new location
- Update requirements.txt for any change of requirements
- You don’t even need to understand
gitto make this work.
A poor man’s “JupyterHub”
This is even lighter-weight than the-littlest-jupyterhub, basically because no authentication is used and there is no persistence of environment such as logins.
- An available BinderHub