Hi, I was wondering if theres a way to launch additional containers from within a Jupyter notebook. I’m guessing it would need to have access to a Docker CLI or API to do this. Anybody done anything like this before ?
I think you could do it when you spawn the notebook container by mounting /var/run/docker.sock and making sure the container is on a manager node.
Option 1: mount
I haven’t seen an example, but @markperri’s would work.
Assuming you are using DockerSpawner, then you can grant users access to the docker API with configuration like:
# mount the docker socket into user containers
c.DockerSpawner.volumes = {
"/var/run/docker.sock": "/var/run/docker.sock",
}
# ensure the users have permission for the docker socket.
# this can also be done in the image
# gid may differ depending on host configuration,
# but this is what I need for a VM created by `docker-machine`
c.DockerSpawner.extra_host_config = {
"group_add": [999],
}
Then you can install the docker cli client and/or pip install docker
for a Python client, and you are off to the races.
However, granting users access to the docker socket like this can be a huge security issue, depending on your relationship between users and the jupyterhub deployment, because they would be able to have extensive admin access. If jupyterhub itself is in docker on the same host, then this would mean all users have full admin access to jupyterhub itself and all other users. If it’s already a shared machine where everybody’s an admin anyway, this doesn’t change anything.
Option 2. jupyterhub-authenticated service
A more controlled approach, with a bit more work, is to run an additional service that can only do what you want them to. Maybe you are talking about dask or spark workers in containers, etc.
For that, you can build a hub-authenticated service with a REST API to take the specific actions you need. In this case:
- only your service talks directly to docker
- it is authenticated with the Hub so you know which Hub user is making requests
- you can launch containers with ‘link’ or ‘network’ arguments so that the requesting-user’s container can talk to the containers it has spawned, but not the containers it hasn’t
- you can implement quotas/limits so that users don’t hog all the resources
This is more work, because you need to:
- implement the service itself
- specify the REST API and/or implement and document a client installed in your user environments
- potentially add cleanup hooks to shutdown sibling containers when the user’s server stops
But at this point, you’ve clearly defined and have control over what users are able to do, and don’t need to worry about them having arbitrary access to docker itself.
Hi thanks for the reply - As you say option2 is probably the preferred way for us. Would you or anybody here be interested in helping us build this type of a solution ? The work would be paid for and the generic parts contributed to the OSS community.