This is a quick jotting down of ideas to see if they make sense to others and to have as a reference in case we wanna try doing any of this work in the future.
Binder is an excellent way to serve arbitrary interactive computing environments defined via the repo2docker spec. However, Binder environments are relatively slow and require extra resources, for a couple main reasons:
They require building an image from scratch
They require spinning up a Kubernetes pod / container once the image is built
They require cloud infrastructure to be present, which is costly and takes expertise to maintain
Idea: A Binder landing page that serves JupyterLite
It would be useful if we made the BinderHub landing page serve a JupyterLite environment. In this case, the repository that the user pointed to be used only for a filesystem, not for the environment (at least until JupyterLite has a clear way to install new packages into it).
In my mind, the UX would be something like:
User hits a BinderHub launch UI
They click “JupyterLite” from a dropdown list somewhere
They type in the repository name and click “launch”
They’re taken to a JupyterLite app
That app is given the repository URL that the person has given.
And it clones / downloads the files in that repository, and opens it in the UI
I think something like this would be quite useful as a lightweight “sandbox” for content that worked with the environment that JupyterLite bundles with it! Over time, you could also imagine some lightweight package installation in addition to this, so the repositories could also specify some other dependencies.
We’re still very much in the early days of reproducible, fully-browser-based compute. The subset of repos which will work, unmodified, is pretty small today. And, as we have learned, when they don’t work perfectly, the first time, people get really upset.
But yes, for showing-before-or-while-building: a more-or-less stock lite site could certainly render a large number of notebooks, images, markdown files, etc.
An (alternate, opt-in) binderhub/nbviewer UI could be a jupyterlite app, with all the link builders, repo explorers, log viewers (whoops, need issue), etc.
lightweight package installation
A lot of things will have to improve substantially before this gets noticeably better. We basically need a mamba-grade SAT solver in the browser, and a much more robust way to cache downloaded files. If lite.mybinder.org became a real thing, that would actually help a lot as they could all share the same “heavy duty” service worker cache.
Do you see this as something that could be added to BinderHub (maybe as a plugin or extension)? Which means BinderHub would also be responsible for caching the built JupyterLite website? (similar to the cached / pre-built Docker images)
An alternative would be to build a static JupyterLite website whenever there are new changes to the repository. For example via a GitHub Action when pushing to the main branch… Although this would not be as convenient as pasting a link to a repo, and might be challenging to support other providers like gists.
Good question, I am not really sure what the right development UX needs to be, I was mostly focusing on the user UX I think it might take some experimenting to figure out what has the least amount of complexity/maintenance overhead.
So I just bought jupyterlite.app! Here’s some idea on how I think something like this can work.
User goes to jupyterlite.app, it provides UI similar to mybinder.org - insert git repo, etc.
We have a serverside process that git clones it (or just URL fetches it), runs jupyter lite build. @jptio does this allow users to run arbitrary code? I hope not, because that will let us run this with super minimal resources! If it does, it becomes a lot more heavy weight.
We serve the built assets statically.
This allows for not just single URLs but entire repos / directories to be served with JupyterLite.
Works out nicely as two parts:
The ‘builder’, which checks out source (or downloads from URLs?), runs lab build (isolated if necessary), and uploads static assets somewhere
the build environment already containing all the extensions in $PREFIX/share/jupyter/labextensions/
this can bring in unwanted stuff
fully specifying the relative/absolute paths or URLs to pip .whl files or conda .tar.bz2 (as these have predictable locations in /share/jupyter/labextensions/)
calculating these is a huge pain… at least conda has --dry-run, but you still have to know what packages are extensions
getting it wrong is worse, as there is often a very tight compatibility window between “client” and “server”
it is still a traitlets app under the hood, so the config loading process might find a jupyter_config.py and do something with it
there are a number of places where the build can ask for absolute paths, so it could ship /etc/passwd from the build machine, nicely base64-encoded for the browser.
this could probably be overriden
So I think you’d need to basically inherit the binder builder opinions (if not the software), but have a concept of a static folder that it was trying to build. Building static docs sites is actually a pretty sweet idea, anyway.
If you don’t mind going fully cloud native the builder could actually be a serverless process e.g. AWS Lambda or Google Cloud Functions? You get build isolation, no need to run/manage an auto-scaling VMs, and a limit on runtime. Then copy the built asset to AWS S3 or Google Cloud Store with a TTL so it’s auto-deleted, and serve it as a website?
A potential snag is managing dependencies (e.g. Python modules from requirements.txt). JupyterLite supports downloading wheels and making them available for runtime installation as part of the build, but it doesn’t install them. If I understand correctly that either has to be done in the notebook using micropip, or by building a custom Pyolite distribution. In the latter case no manual steps are required by the user, but if every repo has it’s own unique Pyolite build that’s a lot of disk space and bandwidth required to download it.
@manics That looks like the sort of thing that would make for an official JupyterLite action, though as you hint at, it would need some quite explicit docs regarding the limitations of what can be built into the distribution from the repo.