Jupyterhub + jupyterlite

Anton_Akhmerov · December 20, 2024, 9:41am

Hello all,

I am considering whether it is possible to provide a “low cost” jupyterhub for education. At a basic level, jupyterhub provides several features:

Authentication
Managed persistent storage
Controlled environment (correct packages installed)
Providing computational resources

Educational needs often differ from research needs. Specifically, the environment is relatively simple—often scipy stack, and computational resource need is low.

Therefore, jupyterlite-based installations seem extremely appealing for educational needs. They use computational resources that the users have anyway, and jupyterlite itself handles distributing the packages.

The main open problem with using jupyterlite for courses is authentication and persistent storage. I therefore wonder if it is possible to provide jupyterhub with something like “jupyterlite spawner”, so that it would only serve the files, but not actually provide relatively expensive computational resources.

I have looked around at various jupyterlab and jupyterlite extensions, and I didn’t quite seem to find the right thing.

minrk · December 20, 2024, 9:53am

That’s a very interesting idea!

I definitely agree that jupyter-lite is super compelling for education, and I have generally assumed that JupyterHub would not be part of such a deployment, but maybe it would be useful?

Someone from QuantStack, who are working on exactly this kind of use case, I believe, might be best positioned to answer the state of the art for authenticated access to persistent storage with JupyterLite.

I believe GitHub - QuantStack/jupydrive-s3: A JupyterLab extension which enables client-side S3 file access is part of it, but I’m not sure what the options are for the ‘logging in’ part.

Anton_Akhmerov · December 20, 2024, 11:42am

Indeed, another alternative would be to have a completely different service that handles authentication and provisions files. There seems to be relatively little existing in that area so far.

A jupyterhub seems to be relatively close to what would be needed. It already has authentication, and jupyter server already provides the correct API for file access. Would it be feasible to make a LiteSpawner that opens a jupyterlite window and and launch single user jupyter servers that do not allow spawning kernels? Or perhaps only configure the jupyter servers to serve jupyterlite as static assets and only allow filesystem access API?

If that worked, a relatively small server could work for a lot of users.

EDIT: I see that modifying storage API is an open issue in jupyterlite

github.com/jupyterlite/jupyterlite

Allow for remote content storage

opened 02:48PM - 10 Mar 22 UTC

larsbonczek

enhancement extension idea

I'd like to use a regular jupyter server as a content storage backend for my jup…yterlite instance. ### Problem I like the concept of the client-side python kernels that jupyterlite provides. My problem is that I still want my notebooks and other files to be stored in the cloud. This way, I can work on my project from different machines. My goal is to set up a binderhub that supports both jupyterlite and regular jupyter servers, depending on the resource/kernel requirements of the individual projects. There should be as little difference for users between the two as possible. My binderhub already supports persistent storage across binder sessions, so the jupyterlite instances need to support cloud storage aswell. ### Proposed Solution My proposed solution is to develop a way to specify a remote content storage API. I would like to be able to tell jupyterlite to use the api/content/* routes from another jupyter server as a content storage API, not it's own local storage API. I could then just give the jupyterlite instances in my binderhub the url of a jupyter server running in the background of the binder instance, whose only task is providing a storage backend. Is it as easy as that, or am I missing something? An even better solution for my scenario would be a centralized storage backend that doesn't require an individual binder instance per user. But this would create many new problems, mainly concerning authentication and inclusion in the binderhub user flow. I'm not sure how to tackle those. Using the proposed solution, I could at least reduce the resource usage of some binder instances by running the kernels in the browser. ### Additional context I'm willing to (try to) implement this myself. I'm mainly asking for feedback on this idea and any tips for getting me started.

mahendrapaipuri · December 20, 2024, 2:24pm

The main open problem with using jupyterlite for courses is authentication

JupyterLite is effectively is serving a static website. If you dont need any fancy auth, you can simply use a basic auth over HTTPS and define a set of usernames and passwords in the basic auth config. I dont really see the need for JupyterHub in this case.

For the case of persistent storage, as @minrk suggested the extension that deals with S3 client side access can be a way to go.

If you absolutely want to protect the JupyterLite with JupyterHub “auth”, you can setup a JupyterHub service that serves the static JupyterLite website to your end users. For instance, here is an example service that serves static readthedocs website as a JupyterHub service.

Anton_Akhmerov · December 20, 2024, 3:32pm

Thank you for the pointers, but unfortunately they don’t seem to address the use case that I have in mind.

Specifically, I am searching for the most resource-efficient way to let jupyterlite users have persistent centrally managed storage linked to their identity (efficient both in terms of human work and server resources).

These two things seem hard to combine. Various client-side lab extensions don’t easily allow to manage the user identity, do they? Asking students in a course to please keep a bucket API key is not practical.

If you absolutely want to protect the JupyterLite with JupyterHub “auth”, you can setup a JupyterHub service that serves the static JupyterLite website to your end users.

I understand that, but that doesn’t address the need to provide users access to persistent storage.

bollwyvl · December 20, 2024, 4:11pm

Welp… been thinking about this for a spell. The right answer depends entirely on what an educational organization already has, and minimizing the number of additional moving parts.

If they have a static HTTPS host, without authentication, things are a little better, as one person, on one computer, can push a lite site with content, and out-of-band means (such as Download from here and upload there, mailto: links, etc) could be enabled at basically a documentation level.

If they already have an LMS, using the LMS-specific storage backend is no doubt the right way to go, requiring some novel HTML/TS/labextension work to build either a new contents backend which exposes it natively or an iframe/postMessage technique that allows for the “outer” application to push content into the lite application, and get it back out.

If they have nothing, I don’t think standing up a JupyterHub, S3, etc. as the first toe in the water would be very feasible (no shade intended to all the work done to make that as easy as possible).

An area I’ve yet to fully explore is wrapping an existing, single-binary software forge, with the most remarkable of them being fossil, the tool which the sqlite team uses: the most-deployed software in the galaxy can’t be entirely wrong, and basically having “GitHub in a 2mb binary” is… insane. Indeed, the most ridiculous extreme would be to build a truly cross-platform binary that used the cosmopolitan-c/hermit stack.

Anton_Akhmerov · December 20, 2024, 5:44pm

“They” in this case could be and I can rather easily run a Jupyterhub, e.g. a tljh.

I also want to do better than email attachments in the year 2025

I believe that if we’re only talking about storage, a reasonable server could accommodate around a couple of thousand students, while running a full jupyterhub only around a hundred.

LMS-specific storage seems unlikely to work because I don’t imagine it’s designed for low latency filesystem-like access. Not to mention that developing a novel lab extension is quite a bit of work

mahendrapaipuri · December 21, 2024, 11:07am

If you are comfortable with running JupyterHub and if S3 works for your use case, I can think of following workflow:

Use native authenticator and use a post auth hook to create a bucket for each authenticated user. You can store bucket IDs and API tokens in auth_state
Run Jupyterlite as a Hub service and inject the bucket credentials into static HTML page for each user. When you serve index page you will have access to user context and hopefully (I am not sure) you will access to auth_state.
Finally, it will be a question of how to pre-populate the frontend JupyterLab extension with these bucket credentials so that whole thing would become completely transparent to end user.

manics · December 21, 2024, 9:53pm

If you’re planning to use Jupyter server to access your storage you’ll be using the Contents Manager rest API:
https://jupyter-server.readthedocs.io/en/latest/developers/contents.html
so unless your LMS is really bad I wouldn’t have thought latency would be a problem.

If you’re actually after real filesystem access (e.g. open("/storage/file.txt")) then even if you run JupyterHub I can’t think of how you could make it work- the kernel needs direct access to the storage so a kernel in the browser won’t have access. You’d need to replace all file access in your notebooks with an API call and if you’re doing that you might as well skip jupyter-server and make your S3 (or any other storage API) calls directly.

nthiery · December 30, 2024, 6:15pm

JupyterLite + auth + persistence would indeed cover a wide ground for educational purposes. And indeed the concept was proven at a large scale by the Capytale project in France: since 2020, they have been serving Jupyter notebooks (and other applications) to all high school students (and more) in France, reaching 500k+ unique users with minimal physical resources. The project was presented at JupyterCon 2023:

The Capytale sources are available there:

Since Capytale was started before JupyterLite, it’s based on an earlier instance of the concept, namely basthon. Some work still needs to be done to bring the functionality to JupyterLite itself.

In the Candyce proposal (candyce.org), one task – to be implemented by QuantStack – was about just that, as a step toward offering lightweight Jupyter services to all universities in France. Alas, Candyce has not been funded (yet), but the desire is still there: with QuantStack, we are actively searching for alternative funding sources for that task. The order of magnitude is 100k€. Suggestions welcome. We are open to partnerships too.

Cheers,
Nicolas

labarba · January 20, 2025, 1:52pm

I was led to this thread by Nicholas, from my post over on a GitHub issue asking about nbgrader with JupyterLite.

It’s not the same question, but it is related, in the sense that we are all looking for scalable and low-cost solutions that build on Jupyter for educational purposes.

In the auto-grading scope, I am imagining two use cases: 1) Suppose you have teaching materials published as a static website (or even better, a JupyterBook), and you have built-in exercises for the learner. Wouldn’t it be nice if the exercises could be done on JupyterLite (so, no server), and be auto-graded showing the learner immediate feedback on their work. 2) In a more complex/advanced setup, suppose you serve learners exercise sets as Jupyter notebooks instrumented by nbgrader, and they “submit” them to auto-grade somehow where the grader runs on the user’s browser via JupyterLite, and the score is captured at the end. In an ideal setting this is integrated to the LMS, and the score is written in the grade book.

One of the disadvantages of nbgrader, as originally designed, is that it relies on a shared file system where learners deposit their notebooks to be graded. The instructor needs read access to the shared files to run the grader. A few years back, I worked with a team to develop a plug-in for the open source LMS that is used by edX which used nbgrader in a Docker container and writes the score in the grade book. The code is open. This solution bypassed the need for a shared file system, which was great. (For several reasons, however, I decided to retire the self-hosted instance of Open edX running this.)

My university uses Blackboard as its LMS. I suppose developing an LTI tool that has the functionality of my Jupyter grader Xblock is too much to ask? I found a commercial product that allows Python code assignments that are auto-graded and can be inserted in a Blackboard course via LTI, called CodeGrade, and I asked for a quote… it’s kind of expensive, at USD$8,000 for 250 students in one course. Obviously an open source solution would be better.

Anton_Akhmerov · January 21, 2025, 9:39am

Ah, @labarba brings up a good point! A real time filesystem-like API is definitely hard, but maybe a more lightweight component that would only do two things—download all files and upload all files—would be within reach? That would cover most exam-like use cases.

Topic		Replies	Views
Deploying JupyterHub for Education discuss	18	5462	May 5, 2019
Deploying JupyterHub at your institution discuss	21	7292	December 11, 2021
Use of JupyterLite in education - references? General jupyterlite	3	143	June 30, 2024
Scaleable JupyetrHub Deployments in Education (Teaching) Education jupyterhub	14	1800	June 21, 2021
Idea: JupyterLite in a USB stick Show and Tell	0	989	December 9, 2022

Jupyterhub + jupyterlite

Related topics