Idea: Binder + JupyterLite = BinderLite?

This is a quick jotting down of ideas to see if they make sense to others and to have as a reference in case we wanna try doing any of this work in the future.

Quick context

Binder is an excellent way to serve arbitrary interactive computing environments defined via the repo2docker spec. However, Binder environments are relatively slow and require extra resources, for a couple main reasons:

  • They require building an image from scratch
  • They require spinning up a Kubernetes pod / container once the image is built
  • They require cloud infrastructure to be present, which is costly and takes expertise to maintain

JupyterLite is a lightway way of bundling a Jupyter interface along with Python packages that are pre-compiled to JavaScript via WebAssembly, so that the whole thing can be served and run on the client side. However, (as far as I know), it currently is bundled as a standalone app with files etc served along with the JupyterLite package itself.

Idea: A Binder landing page that serves JupyterLite

It would be useful if we made the BinderHub landing page serve a JupyterLite environment. In this case, the repository that the user pointed to be used only for a filesystem, not for the environment (at least until JupyterLite has a clear way to install new packages into it).

In my mind, the UX would be something like:

  • User hits a BinderHub launch UI
  • They click “JupyterLite” from a dropdown list somewhere
  • They type in the repository name and click “launch”
  • They’re taken to a JupyterLite app
  • That app is given the repository URL that the person has given.
  • And it clones / downloads the files in that repository, and opens it in the UI

I think something like this would be quite useful as a lightweight “sandbox” for content that worked with the environment that JupyterLite bundles with it! Over time, you could also imagine some lightweight package installation in addition to this, so the repositories could also specify some other dependencies.

2 Likes

Some related issues on the JupyterLite repo:

1 Like

We’re still very much in the early days of reproducible, fully-browser-based compute. The subset of repos which will work, unmodified, is pretty small today. And, as we have learned, when they don’t work perfectly, the first time, people get really upset.

But yes, for showing-before-or-while-building: a more-or-less stock lite site could certainly render a large number of notebooks, images, markdown files, etc.

An (alternate, opt-in) binderhub/nbviewer UI could be a jupyterlite app, with all the link builders, repo explorers, log viewers (whoops, need issue), etc.

lightweight package installation

A lot of things will have to improve substantially before this gets noticeably better. We basically need a mamba-grade SAT solver in the browser, and a much more robust way to cache downloaded files. If lite.mybinder.org became a real thing, that would actually help a lot as they could all share the same “heavy duty” service worker cache.

1 Like

Do you see this as something that could be added to BinderHub (maybe as a plugin or extension)? Which means BinderHub would also be responsible for caching the built JupyterLite website? (similar to the cached / pre-built Docker images)

An alternative would be to build a static JupyterLite website whenever there are new changes to the repository. For example via a GitHub Action when pushing to the main branch… Although this would not be as convenient as pasting a link to a repo, and might be challenging to support other providers like gists.

Good question, I am not really sure what the right development UX needs to be, I was mostly focusing on the user UX :slight_smile: I think it might take some experimenting to figure out what has the least amount of complexity/maintenance overhead.

1 Like

So I just bought jupyterlite.app! Here’s some idea on how I think something like this can work.

  1. User goes to jupyterlite.app, it provides UI similar to mybinder.org - insert git repo, etc.
  2. We have a serverside process that git clones it (or just URL fetches it), runs jupyter lite build. @jptio does this allow users to run arbitrary code? I hope not, because that will let us run this with super minimal resources! If it does, it becomes a lot more heavy weight.
  3. We serve the built assets statically.

This allows for not just single URLs but entire repos / directories to be served with JupyterLite.

Works out nicely as two parts:

  1. The ‘builder’, which checks out source (or downloads from URLs?), runs lab build (isolated if necessary), and uploads static assets somewhere
  2. Something serving the static assets!

(2) could perhaps be cloudflare + cheap R2.

allow users to run arbitrary code

The good:

  • there are no extra python-side hooks that can be extended by files-on-disk
  • npm/yarn and webpack are never invoked

The bad:

  • the set of extension files (and browser kernel packages) to serve relies either on:
    • the build environment already containing all the extensions in $PREFIX/share/jupyter/labextensions/
      • this can bring in unwanted stuff
    • fully specifying the relative/absolute paths or URLs to pip .whl files or conda .tar.bz2 (as these have predictable locations in /share/jupyter/labextensions/)
      • calculating these is a huge pain… at least conda has --dry-run, but you still have to know what packages are extensions
      • getting it wrong is worse, as there is often a very tight compatibility window between “client” and “server”

The ugly

  • it is still a traitlets app under the hood, so the config loading process might find a jupyter_config.py and do something with it
  • there are a number of places where the build can ask for absolute paths, so it could ship /etc/passwd from the build machine, nicely base64-encoded for the browser.
    • this could probably be overriden

So I think you’d need to basically inherit the binder builder opinions (if not the software), but have a concept of a static folder that it was trying to build. Building static docs sites is actually a pretty sweet idea, anyway.

1 Like

If you don’t mind going fully cloud native the builder could actually be a serverless process e.g. AWS Lambda or Google Cloud Functions? You get build isolation, no need to run/manage an auto-scaling VMs, and a limit on runtime. Then copy the built asset to AWS S3 or Google Cloud Store with a TTL so it’s auto-deleted, and serve it as a website?

A potential snag is managing dependencies (e.g. Python modules from requirements.txt). JupyterLite supports downloading wheels and making them available for runtime installation as part of the build, but it doesn’t install them. If I understand correctly that either has to be done in the notebook using micropip, or by building a custom Pyolite distribution. In the latter case no manual steps are required by the user, but if every repo has it’s own unique Pyolite build that’s a lot of disk space and bandwidth required to download it.

I just found jupyterlite-xeus-python which supports pre-installing packages, so I threw together a Docker container:

It only works with repositories with very simple requirements.txt, for example GitHub - binder-examples/requirements: Simple requirements.txt based example doesn’t work. With this very minimal example Repository for testing https://github.com/manics/binderlite-builder · GitHub the build takes up

/home/mambauser/build/jl: 863 files, 82.144266 MB

The browser has to download around 50 MB

1 Like