Running arbitrary services alongside Jupyter notebooks in Binderhub


#1

The original MyBinder service included an optional PostgreSQL service running inside a MyBinder container, and as a New Year resolution I thought I’d have a look at some simple demos for running arbitrary services, such as Postgres, inside a Binder container.

The jupyter-server-proxy allows you to start a service from a Jupyter notebook homepage New menu, so that’s one possibility: create a menu option to start a service and then connect to it. But I’m looking more for a recipe for creating auto-starting/free running services.

An alternative approach is to start a service from the repo2docker start config file. I’ve popped up a minimal example here for autostarting a headless OpenRefine service in a MyBinder repo and connecting to it using an OpenRefine python client in a notebook.

I’ve spent much of the afternoon engaged in false starts, mainly (I suspect) because my knowledge of how all the pieces fit together is a bit ropey, and I haven’t been keeping up with reading the docs.

So here are some things I’ve been flummoxed by - pointers would be appreciated:

  • installing Linux apt packages from non-standard apt repositories: my first thought was that rather than trying to figure out how to get umpteen different services running, if I could get one running via a docker container running inside a Binder container, that would generalise to every other dockerised application by using an appropriate docker run command inside the start file. BUT, installing apt-get install -y docker-ce seems to require adding an additional apt repository so I wonder how to do that? Trying it in postBuild doesn’t seem to work? So my attempt at docker generality falls at the first hurdle.

  • installing a service that is packaged to run under a particular user account: my thought here was if I could get a service running that runs under a particular user different to the jovyan/default user, it may generalise to working with other, similarly distributed applications. I’ve gone round the houses a few times trying to get postgresql installed in this way but with no success. For example, setting a start file to something like:

    #!/bin/bash

    export PGUSER=$NB_USER
    nohup /usr/lib/postgresql/10/bin/pg_ctl -D /var/lib/postgresql/10/main -l logfile start > /dev/null 2>&1 &

    #Do the normal Binder start thing here…
    exec “$@”

which I thought would try to run the service under a default jovyan, rather than postgres, user, doesn’t seem to have much success; running that command explicitly in a terminal throws an error: pg_ctl: could not open PID file “/var/lib/postgresql/10/main/postmaster.pid”: Permission denied ?

In addition, trying to do things like:

sudo su - postgres <<-EOF
       psql -f ${HOME}/db/config.sql
EOF

fails without access to sudo, su, etc etc… But then again, seeding default directories is not really relevant if the PostgreSQL server doesn’t even start.

One other thing I wondered (but haven’t tried yet) was whether running repo2docker locally with user postgres might help in this case, although that doesn’t generalise to running things on MyBinder, where the user is defaulted to something else (typically, jovyan, I think?).


#2

This is currently impossible. We have https://github.com/jupyter/repo2docker/issues/402 (and more) to discuss it. Not sure if there is consensus on whether this should be added or is out of scope.

I think this should work “in principle” but in practice probably is pretty tedious because all the defaults (like PID file directories) also need changing to somewhere jovyan can read/write to. I’d not be mad keen letting repo2docker users run things as multiple different users (same goes for running things as root). I think to keep it a simple approachable tool that it is we should have some amount of resistance to moving ever closer to feature parity with a Dockerfile. More docs for Dockerfiles seems like a better solution.

Sorry that this reply isn’t more helpful. I think exploring how to do this kind of stuff is worth doing, though I am not holding my breath that we will discover a general solution :-/

These guys do seem to have a setup with Postgresql and Binder that works. At the cost of having a Dockerfile. Maybe a place to crib from.


#3

@betatim Thanks for the reply… the “it’s really hard / impossible / way out of scope / has lots of serious issues” feedback is really useful.

I use my tinkering to try to explore use cases born from naivety and open-mindedness (“oh, so does that mean I can try this…?” etc) so identifying dead ends / red herrings is a great help:-)

–tony


#4

I’m also reminded of this Dockerfile from way back that seemed to have postgres running in a Binderhub environment: https://github.com/dchud/datamanagement-notebook/issues/7

I keep forgetting that one…


#5

I think we can add features to jupyter-server-proxy to:

  1. Autostart services on startup rather than on demand
  2. Customize the readyness function, so we can support more things than just HTTP services

Postgres would be a great use case.


#6

@yuvipanda I’m happy to help try to roadtest this stuff, esp. with Postgres… We run a distance ed course that uses Jupyter+Postgres+Mongo+OpenRefine inside a student run VM atm, so I’m ever keen for exploring new ways we might be able to package and distribute this sort of environment.

It also fits nicely as a way of widening participation into opendata stuff if we can find easier ways of setting up environments so folk can actually work with data in powerful, custom environments without getting bogged down in sysadmin voodoo.

–tony


#7

Re: autostarting postgres, a handy thing here would be having access to the port number (if dynamically allocated) in an env var or similar so that things like sql magic could be autoconfigured.

An obvious route for setting up db users and seeding the db on start-up would also be handy, steps which may require commands executing under different user ids / groups associated with the service rather than the default jovyan user, for example.


#8

FWIW, I’d be interested in seeing how this might work in a Littlest Jupyterhub context too…


#9

I got a couple of Dockerfile examples working that start postgres on load:


#10

Most distributions package postgres to be run as a system service, so the user permissions are locked down. Postgres is also available in Anaconda though. It defaults to creating the Postgres socket in /tmp instead of /run/postgresql so it should “just work” without a Dockerfile. Example using environment.yml, postBuild and start:


#11

Thanks for that suggestion…

I’ve been trying to avoid Anaconda (it just introduces yet another set of different dependencies for my use case), but this does seem to simplify things a lot…