Deploying JupyterHub at your institution

Many folks are at institutions that are (or are considering) deploying Jupyter environments on shared infrastructure (for example, in the cloud, or on-prem hardware). JupyterHub is a tool that can enable this relatively flexibly.

In these cases, many are also fighting institutional battles, trying to get buy-in from administrations, and trying to figure out the best way to secure resources / expertise for deploying and maintaining the infrastructure that serves a shared environment.

This is a thread for folks in this situation who’d like to discuss their efforts along these lines, ask questions, and help one another.

(inspired from a recent chat between @psychemedia and @fmaussion in the “introduce yourself” thread)

2 Likes

I just blogged about my solution for internal use in a dev/qa/ops enterprise setting – must be secure and cater to a diverse set of teams too, so it should map to a academic setting of researchers.

Regarding the admin challenge, if you appear with a package instead of a stack of docs to follow, more than half the battle is won already – unless your orga employs Mordac, the Preventer of Information Services.

Start with that, and if you assembled experience and see you need more compute power, only then attack things like docker swarm / K8s scaling of workloads, or using your existing PySpark or other big data infrastructure.

2 Likes

Some observations, part I:

  • expecting people to install software is a blocker:
    • institutional machines may be locked down and prevent package installation;
    • ports may be blocked preventing packages to be downloaded;
    • most people aren’t sysadmins;
    • installing s/w on a ‘personal’ machine can break other things;
    • if I have half an hour to try something, I don’t have 2 hours to spend trying to install it and get it running;
    • nobody has docker installed; and if they want to try it, odds on they’e on a Windows box with Hyper-V enabled by default. Asking them to pop into the BIOS to fix some settings so they can run Docker via Virtualbox is not really a reasonable thing to ask;
2 Likes

What is the minimum kind of requirements you’d have for a JupyterHub that is useful as a demo?
On a spectrum from “vanilla JupyterHub with username/password login, you install the packages you need yourself” to “custom themed JupyterHub, integrated with our (bespoke) auth, all the packages we need are pre-installed”.

What kind of kernels do you need in addition to a Python one?

https://hubhero.net/trial/ will give you a vanilla hub, HTTPs (of course, it is 2019!), three users (one admin, two “students”), password based auth, a vanilla JupyterHub, Python and R kernels (not RStudio), no sharing of work (you could set it up with your admin account though), minimal set of Python packages (the admin can install what ever they like), limited CPU and RAM (this is a free demo afterall). I am happy to tweak the free demo offer to make it more useful, all it needs is input.

@betatim

I think I need to properly try to figure out what barriers I keep coming across. (I like the Hubhero idea but need to think about how that would work for advocacy… Which means I need to clarify what issues / obstacles are…)

The issues are all muddled atm (please bear in mind that my focus is primarily how we can use Jupyter notebooks for creating and distributing teaching materials to distance education students).

  • individual academics don’t see the benefit, don’t have the time/machine/skills to install a local jupyter setup; they don’t like to be seen to fail in private so they want to have a private space to practice in;
  • what’s the best way to demo a notebook setup? On the one hand, there’s just a basic run through of core features (how md works; how you can execute code, etc). I’ve seen taster events where folk very quickly go reactionary and negative (md is too hard (need a WYSIWYG editor), there is no spell check etc etc; and yes, bickering does start at that level; repeatedly…). One workaround for that is to demo a customised environment that does have those features via extensions, but the problem with that approach is that if someone does try their own setup, it isn’t customised in that way and they get frustrated. There is also the risk that if you do demo a customised environment it is seen as too complicated / feature rich / confusing / bewildering.

That’s trying to advocate to users. Trying to get demo servers internally is another thing:

  • no budget code;
  • no business case: very few others are asking for a server because very few others are trying to use notebooks; chicken-and-egg; no-one is asking for this, therefore waste of internal resource putting up such a service; I had hoped that using Azure Notebooks may help build a case but that’s not had much success so far.

The fact that Jupyter is a live project also causes issues. Folk get twitchy that because it gets updates: it takes us two years to produce a course that then runs for five so change is dangerous… Folk see the fact that the application does get updates as the fact that proves it’s unstable, at least on our timescales, or too undeveloped (counter to that, course I’m on adopted IPython notebooks, as was, at a time when they were still a bit unstable; but we were convinced the project had legs and knew there was two years of development likely before we’d have to go live with it to students given our course production timescales).

The fact that I can just launch a server on my own Digital Ocean box also proves it’s not a proper piece of production software. Real software requires lots of folk in IT do do lots of things. (If I take the route that IT are helping me deploy via Azure/Kubernetes, then it’s obviously too complicated, or too expensive having to be commercially hosted, or too risky because it’s a community project with only a couple of core developers, not properly supported commercial software etc etc).

This has got me thinking… Our course has a “How to Fail This Course” Guide (“step 1: spend all day playing Fortnite and ignore the course altogether; alternatively, set aside two evenings a week and half a day at a week to concentrate on your studies” etc). Maybe one way to think about advocacy development is a set of flippant reasons why it doesn’t make sense to use Jupyter stuff, then provide counters to those.

2 Likes

One of the issues I do have with demos is that some of demos we need to put together are of containers with quite complex internals (lots of enabled third party extensions, or with other services running and possibly exposed by using jupyer-server-proxy). Which is to say, the demos are of environments rather than just notebooks, per se. Eg a ‘chemistry course’ environment with 3D molecule rendering extensions pre-installed etc and demoed in demo notebooks.

So as Jupyterhub demoserver admin, I guess one thing I need to be able to do is add links to additional docker images that can be launched by jupyterhub dockerspawner. (I haven’t checked recent JupyterHub to know what’s in the admin panel at the mo…)

Thinking back a long long time when I first encountered C2 wiki, I initially did not get it at all (“what is this strange thing – editable website, you must be crazy?!”).

So part of success might be to bring the unique advantage of notebooks across to people who know nothing about them. “How” is of course the big challenge here.

@psychemedia your point regarding an easy path from demo to personal use is a great one – the shiniest demo has no effect (or even a negative one) if there is no follow-up.

@psychemedia a quick thought (I’ll try to post more details about how the Berkeley JupyterHub deployments work soon): I’ve found that it is really helpful to find allies within the institution early on, and then leverage those people to convince others of the value of things like notebooks, hub, etc. In my experience, professors tend to listen far more to other professors than they do to anybody else, and it’s helped us here to have a few that were clear advocates at the institution. That said, there are also some people that will never jump on board the train, and I think it’s important not to spend too much time trying to convince them. No idea if this maps onto your situation/experience at all, but it’s stuff that we’ve found helpful.

A few thoughts from UC Berkeley…

Here are a few points about the UC Berkeley Jupyter /JupyterHubs deployments. We’ve been deploying (and building) Jupyter tech here for a few years now, have fought many institutional battles, and hopefully some of these will be helpful to others.

As a quick aside, our setup is that we have a single, large, Kubernetes-based JupyterHub that is being run on Kubernetes (at datahub.berkeley.edu). It serves a ~1,500 student data science course, and is run by 1-2 skilled sys-admins and a team of undergrads. There are also a number of hubs that smaller groups on campus have set up for their own uses. The sysadmin for the datahub also administers at least one other k8s jupyterhub for a more advanced data course, and other units like the business school have deployed jupyterhubs on bare metal for their own purposes.

(here’s a link that describes a lot of the jupyter tech at Berkeley)

A few things we’ve learned:

  • Education is usually the entry-point to people using notebooks and JupyterHub. There is absolutely a research use-case, but for people who are inherently skeptical in my experience the faster “sell” is showing how notebooks/jupyterhub are used in a classroom
  • Find advocates at the organization and build good relationships with them, tell them what you want to accomplish. Most orgs will have a few people who are more interested in this kind of thing than others, we’ve had a few over the years, and have found that they were more effective at convincing their colleagues etc to try out the technology than we would have been (in our case, as non-professors).
    • At a university, IME these folks are often newer faculty with more to prove and an interest in making “modern day tech-y kind of person” a part of their professional identity. The challenge is that these folks often don’t have a lot of institutional leverage, though they’re also the ones often tasked with teaching larger classes where JupyterHub can be helpful.
  • Try to find one person that’s skilled in dev-ops that can bootstrap a community of users around a jupyterhub deployment. Give that person secondary help on managing/maintaining/etc the clusters and try to keep their role at the “building and deploying” level. We have a couple people (@yuvipanda and @ryan) that are really skilled at deployments, but that have limited time because there are always a million things to do. The Data 8 course uses teams of people (often undergrads) to try and spot-check and resolve student issues, and only escalates problems to the dev-ops folks when we can’t find any way to solve things.
  • For professors who are interested but hesitant to adopt the stack, don’t make it an all-or-nothing proposition. Berkeley has a team of students/staff that help create data science modules for instructors. These are short, quick Jupyter Notebooks that highlight a particular idea as a part of a class, and that’s it. They often run on the institutional datahub and are a nice way to get people to try interactive computing in a notebook in a low-risk way.
  • When it comes to installing on your own machine, don’t tell people to do this unless you know they’re already quite interested in the Jupyter stack. As you say, it can be frustrating/confusing to navigate the huge number of plugins etc. We try to have people use the DataHub first, and once they’re comfortable with the general pattern, then explicitly cover how to move the workflow onto your own laptop. E.g. the big introduction course that uses the datahub never has students work on their own machines (always via datahub.berkeley.edu). However almost all subsequent courses do make you do work on your machine (though now you’ve got a lot more context for how to do it).
  • Students are usually really enthusiastic about Jupyter stuff and are less-resistant to trying it out. Find ways to highlight their awesome work and it’ll help build and grow a community of practitioners.
  • When you’re talking to campus IT, assure them that you’re not talking about replacing any existing tools / stack, but that Jupyter(Hub) can serve as a great complement to it. Many faculty and IT people feel threatened when someone shows up and says they need to learn something new. I think it’s important to make clear you’re not trying to “replace” them, just trying to bring something new to the table.
  • That said, I think in particular the JupyterHub Kubernetes deployment has been a great way to convince our university they need to invest more in Kubernetes. They were really resistant to this at first, but there’s been a sea-change recently for some reason, maybe because folks are confident that K8S will be around long enough to be able to depend on now. Treat it as an opportunity to skill-up their workforce and provide a useful service at the same time (to make a case that it is a useful service see the earlier point about finding faculty allies, their voices will be more influential in general)
  • Maybe a common thread throughout all these: be patient. Changing institutions, especially universities which tend to be conservative, under-resourced, and laden with bureaucracy, takes a really long time. Try to find small wins and gain allies, and once you’ve got enough of a core group of support/interest, then start escalating up to decision-makers at the university. At Berkeley we’d been in bootstrap mode for like 2-3 years before we finally got buy-in from the university admin. And even then there’s still a lot more work to do.

That’s all I can think up for now, if I think of some other points I’ll try to add them here!

2 Likes

That’s really useful, thanks…

Our situation is that as a course team adopted notebooks back in 2014 for a course that started in 2016, if memory serves me correctly. We independently of the institution developed a Virtualbox VM that ran a notebook server and some other things and students run that on their own machine started via vagrant (there were various reasons I wanted to use vagrant, not least because it gave us a flexible way to post fixes via a new Vagrantfile if we needed to (we didn’t, but it was there as a fallback). Course numbers there are 400 or so students a year, module lasting 9 months and requiring about 4 hrs a week on notebook activities.

We also ran a course using notebooks on a FutureLearn MOOC (Learn to Code for Data Analysis) but learners there were encouraged to use Anaconda, though we also started to mention Azure Notebooks and CoCalc.

Via a collaborator in IT, we managed to get a simple Jupyterhub temporary notebook server accessed from the Moodle VLE and running on Azure Kubernetes (we’ll be posting our deployment notes at some point; 1k students on course, but as a 1-2 hour optional activity we get about 10-20% adoption(? ). We got away with this as much as anything because the institution was looking the other way at the time. Now we have a precedent, we need to keep the server running for each new presentation of the course :wink:

As far as battles go, I think I weakly tried to open too many fronts at once: I think internal JupyterHub server would be useful, but I also think Binderhub and Docker servers would be useful too. They each serve different needs, and can act as gateway drugs to each other. But all three technologies are alien to pretty much everyone so it’s hard to make a case for how they complement each other.

As you mentioned, students who’ve used the tech are our best allies and I should make more use of student feedback (positive and negative).

As far as adoption goes, I mentioned our production model briefly: two years to design and present a course for the first time and then it’s pretty much supposed to go unchanged for another five, partly because the presentation materials are handed over from the module team to the VLE folk. We’ve twisted that with our VM course because we control that, and the notebooks that students use within it. This has allowed us to tweak and fettle the VM and notebook material for each course, in part in light of student comments, in part as a result of updates to the pandas / scipy stack etc. Module teams also tend to work independently, rather than working together around a particular technology other than ones forced on us (like the VLE).

This production model is one reason there is so much internal friction; but at the same time, the nature of the notebooks could, I believe, be transformative to how we develop and deliver (interactive) materials to our remote students. As much as anything, we are a publishing house. I really should pay more attention to how O’Reilly work. (And I also need to do some demos around Jupyter Book :wink:

The research rationale for running notebook servers is another front that could be opened and I suspect the best way to that would be by running workshops for postgrad research students. (This is the only class of student we actually see face to face; we have a few hundred internal postgrad students on campus; our undergrad population is completely remote.)

Being a remote worker myself, my opportunity for daily face to face lobbying is also severely limited. If you’re on the end of a Skype call rather than in the room, it’s harder to make a point and gauge how well it’s gone down.

Re: patience… yes… I know… and: breathe… :wink: I also need to get better at drip feeding the story over a long campaign rather than: this, this, this, oh, and this, this, this… etc. (I’m pretty sure I’ve drunk too much of the big picture Kool Aid…!)

Going slightly off topic here: has anyone experimented with using nteract for courses? It doesn’t solve the distribution of homework problem but in some courses (for grown ups) I have run, people have started arriving with nteract as their “Jupyter notebook setup”. People love that they can now double click notebooks and do many other things you can only do if you are a desktop app.

One idea I started several years ago and then never finished (the problem didn’t cause enough pain…) was to bundle a ready to go Python environment with nteract. It does drive me mad that you (used to) get a JS kernel as default with nteract, not Python. We started https://github.com/nteract/snakestagram to create the environments for Windows, Mac and linux. I’ve used this approach of bundling conda environments inside electron apps for client work -> it works (some “definitely not tech” people at a UN agency were running tensorflow models to classify images on their work laptops without ever knowing that they were using Python and tensorflow, etc they just clicked buttons in a GUI). However I’ve never had enough time to port it all back to nteract proper :frowning:

The end result is that you have a “proper desktop app installer” you install, it install a “proper app”, which contains all the complicated Python stuff that is otherwise hard to explain how to install. I think this would be a pretty slick experience.

I think having a desktop app which has the right Python (or what have you) environment bundled with it would be great. The downside is someone has to actually build a first working version of this … in that copious spare time we all have.

I think “easy + simple technical ways to distribute stuff” is very on-topic. Regarding the “right Python”, you can use pyenv’s builder to get any version you like at any place you like (only tried it on Linux), which is a great building block.

1 Like

The nteract+bundled python possibility (it used to be on the roadmap?) was really powerful, I think. It would essentially provide an equivalent of a tmpnbserver in the form of a self-contained, cross platform app.

As nteract currently stands, there’s the blocker on the requirement of having an IPython kernel that’s installed and discoverable.

One way to this is to install Anaconda, but that can be a pain for folk who aren’t in the habit of installing apps on their machine. (One recent helpdesk issue I saw came from someone who’d failed to get s/thing working on a PC and borrowed a Mac. They followed instructions to download an app but then didn’t realise you had to double click on the downloaded package to go through an install process, and when that was pointed out didn’t know where or how to find the download.)

I’ve also seen folk get in a real muddle with conda, esp when it comes to folk trying to install packages via a notebook and it not working because they’ve actually installed into the wrong py environment. (I think this is easier now in notebooks with %pip magic?) Something like RStudio has a panel specifically for loading packages and that might be something worth implementing as a dialogue / wizard in nteract, and perhaps even an extension in notebooks or JupyterLab. To a certain extent this makes the UI more complex, although it does have the benefit of separating concerns. The simplicity and power of the command line stands in contrast with “simple” walk-you-through- it UIs that actually make the UI far more complex to design, build and navigate. Such a trade-off was ever thus. (We also need to remember many people have never seen the command-line/terminal/command prompt and have no idea how to drive it. The Jupyter notebook code cell ! is really useful here…)

1 Like

pyenv is really interesting… but confusing to a novice? I guess one trick is for us to distribute things that way and to try to provide self-contained envts that work? But things become problematic if folk end up in another env? So we need good ways of signalling that to them?

By the by, this whole UI thing reminds me of this thread on a desktop Binderhub / repo2docker GUI: What would a repo2docker GUI look like?

That can provide another way of getting an environment onto a desktop, although with the Docker requirement (and again: for users on Windows 8, that is problematic…)

1 Like

I agree that nteract could be a great UI for pedagogy. The main blockers that I’ve run into are:

  1. Not having jupyter widget support is a total non-starter for us :-/
  2. It’s double-click-launchable, but you need to install kernels “the old fashioned way”. You can use something like Anaconda GUI, but then the “double click to launch nteract” benefit isn’t as unique
  3. I’ve found that folks think of nteract as a weird middle-ground between the “traditional” notebook interface and JupyterLab. They see JupyterLab as a “different” kind of interface from notebook/nteract, and because most of the instructors have started off by learning the “classic” interface, it’s hard to convince them why they need to switch if they want a “notebook-first” UI…that’s one I’m still trying to figure out, because I think nteract could make for a much cleaner reading experience than the notebook interface or jupyterlab (if nteract ever picked up widgets support)

I wondered whether desktop apps might become a contested space, eg nteract vs stencila.

I asked the Stencila folk a few weeks ago if the Stencila app had deprecated. Response was:

In hiatus, but not yet deprecated. Since the article editor in the Desktop is built on top of Texture, the plan is to have Stencila act as a plug-in providing the reproducible parts alongside plug-ins for other frameworks like Jupyter.

If the Texture desktop with Stencila plug-in meets users needs then we will probably deprecate the Stencila Desktop.

1 Like

Thanks for the info! I am so confused about the plan for the stencila/substance/etc stack haha

I think I saw it on the roadmap recently but without a human who is keen on (facilitating the implementation of) this feature I think it will linger :-/

This is why we need snakestagram so that the environment/kernel you want is bundled as part of nteract. You install nteract and it has a Python kernel + packages already there. Totally separate from what you already have installed on your machine. The files are somewhere inside the .app (on OSX) bundle that people install.

Do you know if there is a technical reason that makes it impossible to have widgets or requires a huge change in how things are done? My guess has always been that the reason it doesn’t exist is that there isn’t anyone who needs this badly enough to step up and build it (or pay someone to build it).

The main reason nteract has lacked widget support is the lack of a spec for widget messages. In the Jupyter spec, widgets sit (deliberately) on a black-box extension channel (comms) where extension authors write both sides of a custom communication protocol (with Kernel-specific code and matching front end-specific code, e.g. nbclassic or JupyterLab). There is a baseline that nteract would need the widget management (open/close/etc.). Subsequent to that, nteract would need an implementation of the frontend-side of each widget it wants to support. Finally, for custom widgets (beyond e.g. what’s in ipywidgets), nteract would need an extension mechanism for registering new frontend widgets. Nteract has never had extensions (unless this has changed recently), so this is probably the biggest hurdle.

To summarize, the work that needs to be done:

  • jupyter-widgets needs to thoroughly document the spec of baseline widget messages, management, and behavior
  • nteract needs to implement this (analogous to WidgetManager in JupyterLab, I think)
  • every individual widget to be supported needs to document its spec (model at least and any custom messages)
  • nteract needs to implement the frontend spec of each widget to be supported
  • finally, nteract would need an extension mechanism for registering third-party widgets, and authors would need to take this up. This is I think the least likely to happen.

I don’t think it’s likely nteract will pick up extensible widgets, but support for core ought to be doable and should begin with a spec/documentation effort on the Jupyter-widgets side.

3 Likes