Running repo2docker images on HPC system with JupyterHub and JupyterLab

I am currently working on duplicating some of the functionality of BinderHub on an HPC system at NERSC using repo2docker. I’m starting with a git repo, pulling it locally, running repo2docker to build a Docker image, then using docker to push that to another image repository at NERSC, then running Shifter, the local HPC container tech, which pulls and converts the Docker image and loads the filesystem as read-only. There is a /tmp writable area in the container, and I can bind to other external writable volumes as well, similar to Docker.

To provide a little more detail, I’m testing this at NERSC with a JupyterHub deployment that is running jupyterlab-hub from the spawner (in my case jupyterlab-hub from inside a container). I am binding an external volume when I start the container, and I also use the ‘start’ script override from repo2docker to copy files out once the container is running that need to be writable and adjust environment variables to prevent write errors. I did some testing without JupyterHub first, and then moved to running under JupyterHub once that standalone mode was working ok for a few sample binder compatible repos. Things are a bit brute force at the moment, I’m copying out a lot that perhaps is not needed while I troubleshoot problems related to the read-only filesystem. I’ve been through the docs for repo2docker, Jupyter, JupyterLab, npm, conda, etc to track down environment variables, config files, etc.

I really started running into problems unrelated to the read-only filesystem when I was trying to test github repos that required JupyterLab extensions. After some debugging and digging, I eventually realized that the JupyterLab version from repo2docker was pinned to an older version of JupyterLab, 0.35.4 for Python 3.7, and that most of the binder repos and binder-examples seemed to be designed to run in the classic notebook interface. There is an open issue here about the JupyterLab version, repo2docker issue 724.

I did find repos that contained the code for a lab extension and worked at mybinder.org, specifically from jupytercalpoly/jupyterlab-richtext-mode, but I found others that were broken such as binder-examples/appmode. The appmode one in particular was difficult to debug, but it looks like some incompatibility between the JupyterLab version from repo2docker and the extension versions installed. Notably, there is not a pinned version of ipyvolume, which seems to be the source of the breakage, somewhere it diverged from the binder-example versions used.

Partly this has to do with a repo needing to fully specify the dependencies, which here also includes specifying the JupyterLab version needed to work. However, it would also be very useful to have a recipe for getting a particular JupyterLab extension to install correctly in a binder-like context where you can’t assume that updating everything to the latest version is an available option. If I want to install extensions X, Y, Z, which versions are compatible with each other, and is there a common JupyterLab version they would all install and run properly under? Maybe some additional dependency detection specific to JupyterLab and extensions in repo2docker could help in this particular case?

Any advice on the read-only install or JupyterLab with extensions is appreciated!

Nice to see someone working on this and sharing! THere are a few more people interested in having BInderHub like functionality but without kubernetes, hopefully by posting here you can encourage people to join in with their expertise/ideas/work!

From your post I wasn’t sure if you know or not so I wanted to explicitly say it: you can specify a more modern version (or any version really) of jupyterlab as a dependency and repo2docker will install and use that one. If not that is a bug.

Maybe the time has come where we should update the default version of lab inn repo2docker. We were being slow with this exactly because 1.0 changed a lot of things for extensions. So the idea was to stick with the old version to give extension authors time to make the switch.

Thanks, Tim.

Right, you can explicitly upgrade JupyterLab, but you do need to know which version to upgrade to. The example I cited of a working extension in binder from jupytercalpoly is a bit different because it also contains the extension code in the repo, but there is an explicit version number used for JupyterLab that upgrades the install. This is why that example works in binder, by upgrading JupyterLab. The appmode repo from binder-examples ‘knows’ what jupyterlab to expect from repo2docker and tries to pin things to that, which should work except there are unpinned things that have broken that now. Somehow those version numbers were knowable at the time, maybe they just happened to be the latest when that repo was last updated?

Bokeh has provided a table with their extension which is helpful in trying to figure the version matching out for that extension, but maybe there is also a check that can be run against a given extension to see what JupyterLab versions it is compatible with? The bokeh extension is here, https://github.com/bokeh/jupyter_bokeh, and the binder compatible repo that does not explicitly install the extension is here, https://github.com/bokeh/bokeh-notebooks.

I can certainly fork github repos and start patching them to work with JupyterLab and repo2docker using trial and error on version numbers, but it would be very useful to have a guide or recipe/formula on doing that without guessing or exhaustively testing all versions. Or maybe there are some other example repos that are meant to run with binder and JupyterLab and extensions and I just haven’t found them yet? Entirely possible. I could also be missing something else that is obvious to the JupyterLab developers or others.

Also, I know that JupyterLab 2.0 is in progress and coming out soon, but I’m not sure how that will affect extensions.

Just to be clear, I’m not suggesting that the Jupyter team is or should be responsible for every git repo with Notebooks, that would be absurd. Instead, knowing that we (NERSC/Berkeley Lab) will want to have some curated git repos with Notebooks that users can spin up and try out and we know will work, it would be very helpful to have some guidance on how to create a git repo that uses one or more lab extensions and that will work with repo2docker. Maybe there should be some encouragement of extensions that are expected to be commonly installed to have a compatible binder repo for running in Lab 0.x, 1.x, 2.x.

Without going into details on how the NERSC JupyterHub is deployed, users can’t easily try out extensions at the moment, and it would be very useful for us to be able to provide examples of extensions to try out, or a recipe they could follow to create a git repo that would allow them to play with certain extensions. I would hope this could also be useful for others.

I’m forging ahead, but I think I could benefit from some more info or advice, if available. Even just pointing me at more places to check could be helpful.

1 Like

I think the best long term solution is for extension packagers to define the versions of jupyterlab they support in the package metadata, and to use a package manager that understands it.

For example if you’re using conda you can specify a version range for each dependency. If you install another extension conda will figure out whether there’s a version of jupyterlab or any other dependency that works with both.

The other big advantage is you won’t be coupled to the version of repo2docker.

1 Like