Jupyterhub + jupyterlite

Hello all,

I am considering whether it is possible to provide a “low cost” jupyterhub for education. At a basic level, jupyterhub provides several features:

  • Authentication
  • Managed persistent storage
  • Controlled environment (correct packages installed)
  • Providing computational resources

Educational needs often differ from research needs. Specifically, the environment is relatively simple—often scipy stack, and computational resource need is low.

Therefore, jupyterlite-based installations seem extremely appealing for educational needs. They use computational resources that the users have anyway, and jupyterlite itself handles distributing the packages.

The main open problem with using jupyterlite for courses is authentication and persistent storage. I therefore wonder if it is possible to provide jupyterhub with something like “jupyterlite spawner”, so that it would only serve the files, but not actually provide relatively expensive computational resources.

I have looked around at various jupyterlab and jupyterlite extensions, and I didn’t quite seem to find the right thing.

1 Like

That’s a very interesting idea!

I definitely agree that jupyter-lite is super compelling for education, and I have generally assumed that JupyterHub would not be part of such a deployment, but maybe it would be useful?

Someone from QuantStack, who are working on exactly this kind of use case, I believe, might be best positioned to answer the state of the art for authenticated access to persistent storage with JupyterLite.

I believe GitHub - QuantStack/jupydrive-s3: A JupyterLab extension which enables client-side S3 file access is part of it, but I’m not sure what the options are for the ‘logging in’ part.

1 Like

Indeed, another alternative would be to have a completely different service that handles authentication and provisions files. There seems to be relatively little existing in that area so far.

A jupyterhub seems to be relatively close to what would be needed. It already has authentication, and jupyter server already provides the correct API for file access. Would it be feasible to make a LiteSpawner that opens a jupyterlite window and and launch single user jupyter servers that do not allow spawning kernels? Or perhaps only configure the jupyter servers to serve jupyterlite as static assets and only allow filesystem access API?

If that worked, a relatively small server could work for a lot of users.

EDIT: I see that modifying storage API is an open issue in jupyterlite

1 Like

The main open problem with using jupyterlite for courses is authentication

JupyterLite is effectively is serving a static website. If you dont need any fancy auth, you can simply use a basic auth over HTTPS and define a set of usernames and passwords in the basic auth config. I dont really see the need for JupyterHub in this case.

For the case of persistent storage, as @minrk suggested the extension that deals with S3 client side access can be a way to go.

If you absolutely want to protect the JupyterLite with JupyterHub “auth”, you can setup a JupyterHub service that serves the static JupyterLite website to your end users. For instance, here is an example service that serves static readthedocs website as a JupyterHub service.

1 Like

Thank you for the pointers, but unfortunately they don’t seem to address the use case that I have in mind.

Specifically, I am searching for the most resource-efficient way to let jupyterlite users have persistent centrally managed storage linked to their identity (efficient both in terms of human work and server resources).

These two things seem hard to combine. Various client-side lab extensions don’t easily allow to manage the user identity, do they? Asking students in a course to please keep a bucket API key is not practical.

If you absolutely want to protect the JupyterLite with JupyterHub “auth”, you can setup a JupyterHub service that serves the static JupyterLite website to your end users.

I understand that, but that doesn’t address the need to provide users access to persistent storage.

1 Like

Welp… been thinking about this for a spell. The right answer depends entirely on what an educational organization already has, and minimizing the number of additional moving parts.

If they have a static HTTPS host, without authentication, things are a little better, as one person, on one computer, can push a lite site with content, and out-of-band means (such as Download from here and upload there, mailto: links, etc) could be enabled at basically a documentation level.

If they already have an LMS, using the LMS-specific storage backend is no doubt the right way to go, requiring some novel HTML/TS/labextension work to build either a new contents backend which exposes it natively or an iframe/postMessage technique that allows for the “outer” application to push content into the lite application, and get it back out.

If they have nothing, I don’t think standing up a JupyterHub, S3, etc. as the first toe in the water would be very feasible (no shade intended to all the work done to make that as easy as possible).

An area I’ve yet to fully explore is wrapping an existing, single-binary software forge, with the most remarkable of them being fossil, the tool which the sqlite team uses: the most-deployed software in the galaxy can’t be entirely wrong, and basically having “GitHub in a 2mb binary” is… insane. Indeed, the most ridiculous extreme would be to build a truly cross-platform binary that used the cosmopolitan-c/hermit stack.

2 Likes

“They” in this case could be :raising_hand_man: and I can rather easily run a Jupyterhub, e.g. a tljh.

I also want to do better than email attachments in the year 2025 :smiley:

I believe that if we’re only talking about storage, a reasonable server could accommodate around a couple of thousand students, while running a full jupyterhub only around a hundred.

LMS-specific storage seems unlikely to work because I don’t imagine it’s designed for low latency filesystem-like access. Not to mention that developing a novel lab extension is quite a bit of work :woozy_face:

1 Like

If you are comfortable with running JupyterHub and if S3 works for your use case, I can think of following workflow:

  • Use native authenticator and use a post auth hook to create a bucket for each authenticated user. You can store bucket IDs and API tokens in auth_state
  • Run Jupyterlite as a Hub service and inject the bucket credentials into static HTML page for each user. When you serve index page you will have access to user context and hopefully (I am not sure) you will access to auth_state.
  • Finally, it will be a question of how to pre-populate the frontend JupyterLab extension with these bucket credentials so that whole thing would become completely transparent to end user.
2 Likes

If you’re planning to use Jupyter server to access your storage you’ll be using the Contents Manager rest API:
https://jupyter-server.readthedocs.io/en/latest/developers/contents.html
so unless your LMS is really bad I wouldn’t have thought latency would be a problem.

If you’re actually after real filesystem access (e.g. open("/storage/file.txt")) then even if you run JupyterHub I can’t think of how you could make it work- the kernel needs direct access to the storage so a kernel in the browser won’t have access. You’d need to replace all file access in your notebooks with an API call and if you’re doing that you might as well skip jupyter-server and make your S3 (or any other storage API) calls directly.

1 Like

JupyterLite + auth + persistence would indeed cover a wide ground for educational purposes. And indeed the concept was proven at a large scale by the Capytale project in France: since 2020, they have been serving Jupyter notebooks (and other applications) to all high school students (and more) in France, reaching 500k+ unique users with minimal physical resources. The project was presented at JupyterCon 2023:

The Capytale sources are available there:

Since Capytale was started before JupyterLite, it’s based on an earlier instance of the concept, namely basthon. Some work still needs to be done to bring the functionality to JupyterLite itself.

In the Candyce proposal (candyce.org), one task – to be implemented by QuantStack – was about just that, as a step toward offering lightweight Jupyter services to all universities in France. Alas, Candyce has not been funded (yet), but the desire is still there: with QuantStack, we are actively searching for alternative funding sources for that task. The order of magnitude is 100k€. Suggestions welcome. We are open to partnerships too.

Cheers,
Nicolas

4 Likes

I was led to this thread by Nicholas, from my post over on a GitHub issue asking about nbgrader with JupyterLite.

It’s not the same question, but it is related, in the sense that we are all looking for scalable and low-cost solutions that build on Jupyter for educational purposes.

In the auto-grading scope, I am imagining two use cases: 1) Suppose you have teaching materials published as a static website (or even better, a JupyterBook), and you have built-in exercises for the learner. Wouldn’t it be nice if the exercises could be done on JupyterLite (so, no server), and be auto-graded showing the learner immediate feedback on their work. 2) In a more complex/advanced setup, suppose you serve learners exercise sets as Jupyter notebooks instrumented by nbgrader, and they “submit” them to auto-grade somehow where the grader runs on the user’s browser via JupyterLite, and the score is captured at the end. In an ideal setting this is integrated to the LMS, and the score is written in the grade book.

One of the disadvantages of nbgrader, as originally designed, is that it relies on a shared file system where learners deposit their notebooks to be graded. The instructor needs read access to the shared files to run the grader. A few years back, I worked with a team to develop a plug-in for the open source LMS that is used by edX which used nbgrader in a Docker container and writes the score in the grade book. The code is open. This solution bypassed the need for a shared file system, which was great. (For several reasons, however, I decided to retire the self-hosted instance of Open edX running this.)

My university uses Blackboard as its LMS. I suppose developing an LTI tool that has the functionality of my Jupyter grader Xblock is too much to ask? I found a commercial product that allows Python code assignments that are auto-graded and can be inserted in a Blackboard course via LTI, called CodeGrade, and I asked for a quote… it’s kind of expensive, at USD$8,000 for 250 students in one course. Obviously an open source solution would be better.

1 Like

Ah, @labarba brings up a good point! A real time filesystem-like API is definitely hard, but maybe a more lightweight component that would only do two things—download all files and upload all files—would be within reach? That would cover most exam-like use cases.