Feature idea: Jupyter Books as a landing page for hub log-ins

Background

Currently, most JupyterHubs go from login → some data science user interface (e.g., JupyterLab, RStudio, Jupyter Notebooks, etc). However, many hub use-cases also come with their documentation, of some form. For example, a classroom might have a syllabus landing page with course updates etc.

In those cases, the JupyterLab interface might not be the greatest “landing spot” for a new log-in. Perhaps we could find a way to present the users with a different “landing page” in a lightweight and flexible manner.

Idea

It would be useful if hub administrators could do two things:

  1. Bundle a static website along with their hub. This could live at myhub.org/docs or myhub.org/home.
  2. Direct users to that website after hub log-in

That way, people could use this static website as a dynamic “home page” for the JupyterHub - they could populate with useful information, like:

  • updated information and notices for users
  • links to tutorials and guides
  • IFrame embeds of other services like Discourse, Twitter, or Slack
  • Jupyter Books
  • Lightweight UIs via MyST Markdown (e.g., like the Jupyter Book gallery or a list of buttons to launch into different UIs).

Just throwing this idea out here in case anybody else is excited by it. cc also @colliand who riffed with me about this a bit and @yuvipanda who suggested I write it up here.

I also wonder if this is the kinda thing that @rabernat and @jhamman in Pangeo-land would find interesting?

9 Likes

One thing I’ve used before is Jupyter-server-proxy to proxy a simple flask app homepage server eg on /web or /home.

It has the advantage of then being able to act as a jump off point to various other services / URLs, as well as to /lab, /tree etc.

1 Like

Noodling on this:

  • Less complex to support a JupyterBook than an open ended website. If JupyterBook adds a handful of plugins and themes then you don’t need fancy outside content.

  • Is the root of the problem that JupyterLab itself needs to support more types of content like books and myst? Then it would just be a matter of setting the hub entry point to the documentation.

    • On the other hand, I do love ReadTheDocs’ /docs github hooks.

Although if you can serve just a bunch of HTML files as a static website, that’d probably be the simplest of all options.

I think the reason not to use JupyterLab is just because of the complexity required in updating the interface. Something like Hugo, or Jupyter Book, or Sphinx, etc could be edited and updated much more easily than something like JupyterLab.

Could you go into more detail on the /docs thing?

Thanks for starting this discussion.

I, for one, am interested (and am in the process of pitching internally) a full Lumino/Lab3-based “skin” of the hub UI. While not every hub/team would benefit from this, there are a lot of use cases for where it would make sense.

My motivation is that I’ve found a number of bespoke hubs end up having to do a lot of skinning in tornado/jinja/bootstrap-land that can make it harder to upgrade when something changes upstream. The template/handler hacks get propagated by cargo-culting, and generally don’t compose very well.

I imagine being able to log into my hub, and without starting any kernels, have a quick look at composable UI provided by different pip/conda-installable modules on the head node, tying into hub-managed services, or external things, either before or after log-in:

  • the current State of the Hub, either with a heavyweight solution like grafana, or something lighter-weight with vega
  • for calendar-aware hubs, upcoming meetings, etc. that could be backed by jupyter-videochat, which is already pretty hub-focused
  • for binder-aware hubs, something that lets you get a quick look at what are the current popular deploys, as well as context-aware embeddings of the binder/nbgitpuller generators, etc.
  • for slurm/dask/prefect/dagster/papermill-aware hubs, see the state of long-running processes, and potentially authoring new ones with LSP support (ideally in-browser, as we haven’t even begun to think about what that would mean)
    • though something like sourcegraph would play extremely well… code search across my whole hub? mmmmm
  • for dataset-aware hubs, some skin on CKAN/whatever to see hot-and-fresh data sets, maybe an already-logged-in REST/query explorer
  • for package manager/CI-aware hubs, the hottest in-house packages, installers, docker images, PDFs, whatever it is you are building
  • for VCS-integrated hubs, a quick look at my projects/PRs, etc
  • for LMS-integrated hubs, a quick look at the assignments, comments I’ve got coming up, or steps needed to complete assessment
  • for chat-enabled hubs (e.g. zulip, matrix, mm), a quick look at my recent topics
  • …and yes, static docs! but ideally something even smarter, e.g. can spin up a thebelab/voila apps, can use these other concepts to Go to definition, offer rich commands, etc.

Across all of that (and likely guiding it), a faceted search capability, which combines results from whatever search APIs they provide natively, but combined into something more harmonious.

A big win here would be the further ability to re-use some of these components within a then-spawned Lab, and similarly to use Lab themes and translations at the “door handle” of the interactive computing experience. Bringing some of the concepts from lab (like commands, keyboard shortcuts, user preferences, etc) would further reduce the dissonance between “I’m about to interactively compute” and “I am interactively computing”

2 Likes

The admin controls of ReadTheDocs.org allow you to watch a repo for changes. Then it builds whatever is in /docs Incoming Webhooks and Automation — Read the Docs 5.8.5 documentation

1 Like

ahhh yes - that is indeed awesome :slight_smile:

This should be the start! Someone should build jupyter-static-server-proxy. We can then combine that with nbgitpuller to server pre-built static jupyter-book or any sphinx sites. And that will be the ‘landing page’ for users, since you can have them click the nbgitpuller link or set that as your default hub URL

1 Like

Jupyter already serves static files. If they are in the user’s launch directory, you can get a lot of this with:

c.Spawner.default_url = "files/path/to/index.html"

If they can’t be, you should be able to use extra_static_paths config (in the user env) and default_url = “static/path/to/index.html”.

But if you do want jupyter-static-server-proxy as its own thing, it should be a single server extension that registers a single StaticFileHandler for your directory (or directories) to serve. And again, you would use the config c.Spawner.default_url = "some-prefix-set-in-extension/some/relative/path/index.html" to start users looking at your new handler instead of the default app.

2 Likes

If I go to user/yuvipanda/files/quantecon-mini-example/index.html, I get:

(the file exists - I pulled it in via nbgitpuller)

Some sleuthing in the logs shows me that the issue is that Referer is not set, rather than Origin. Thoughts on how to get around that? Also not sure what this is protecting against - CSRF?

The redirect that’s a result of login from default_url should mean this works. Links from other websites won’t, but the actual login process and visiting /user/name/ redirect should still be okay. Typing the url by hand won’t work, though.

It was XSSI, I believe, and so frustrating. I hate that we do this, but couldn’t find a better way. I just got an idea that a weird redirect loop might work for ‘real’ browsers but not the XSSI case…

@minrk aaah, damn. Yeah, then we will need a different solution for a static proxy - but wouldn’t it have the same issues that require the origin / referer check?

Also do you have a link to the PR / issue that introduced this check? Couldn’t really find it

What case wouldn’t work? Links from anywhere on the internet to /hub/user-redirect/ would work since the referer would be /user/name/.

PR is here (private, since it was for a reported vulnerability, CVE-2019–9644). The commit is here, release note here.

If I recall correctly, SameSite cookies could also fix the issue in a better way, but aren’t easy to use with Python < 3.8, and would stop working in iframes.

Web security is my least favorite thing in the world.

wouldn’t it have the same issues that require the origin / referer check?

not necessarily. If it’s not an authenticated handler (e.g. using extra_static_paths would not have this issue), it shouldn’t need this check. This check is only for resources you care about leaking. If you don’t care about XSSI for the resources you are serving, it’s not a necessary or helpful check.

So I just tried https://datahub.berkeley.edu/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fexecutablebooks%2Fquantecon-mini-example&urlpath=files/quantecon-mini-example%2Findex.html&branch=gh-pages - it’s a nbgitpuller link to a static html page, and gave me the same error. My hope is to use nbgitpuller to distribute content, rather than default_url - since nbgitpuller lets you dynamically set that on a per-user bases.

I’m silly. We have /view/ for this, /files/ is meant to be only for downloads. So the URL should be /view/ not /files/ I think.

I’m not sure if that will work for you, either, though, due to iframe navigation weirdness.

You might give the extra_static_files_path approach a try. That’s not authenticated since it assumes the files it serves are not sensitive, which I imagine is the case.

I worked with @fperez for students to be able to run sphinx on their hubs and see the output. He graciously wrote up 7. Some tips for HW06 — Collaborative and Reproducible Data Science, which looks sort-of-simple-enough maybe.

1 Like

Quick note that the above docs referred to by @yuvipanda now live here. I had to do some reorg of our homeworks and moved them, so wanted to update in case anyone finds a broken link in the future…