Related GH issue: https://github.com/jupyter/repo2docker/issues/487
I would like Binder and Repo2Docker to start from an arbitrary docker image made easy, i.e. not only for “expert users”.
I am a scientist, not a computer scientist.
I’m maintaining a MyBinder repo which runs nicely (oggm-edu). But I’d like to build on this in order to offer a real computing environment, i.e. by setting-up our own hub with more resources and customize following the recommendations of the Pangeo project.
As a first step, I would like to start MyBinder from an existing docker image (that I am happy to modify in order to fit Repo2Docker’s needs). This docker image (and its daily tags) provide all the packages I need to run my glacier model in a reproducible way. We use these docker images intensively for CI and testing, but most importantly on our cluster via Singularity. We can now provide a dockerhub image tag along our scientific publications, which is great.
What I expect from this
There are several things I’d like to improve by making repo2docker start from our own image:
- currently, our environment build is complex and large (many dependencies). The conda environment became so large and was breaking so often that I now install everything via pip instead (I went from a ~4.5Gb image size to 3.5Gb, which could further be reduced if Repo2Docker wouldn’t install conda per default). This results in messy build files and is silly because we have a working environment that we control on DockerHub already.
- the default behavior of MyBinder is to rebuild everything after each commit to the repo. In practice, we change the content (the notebooks) very often, but almost never change the computing environment. Each time we update the notebooks, I am worryingly tracking the logs expecting that something is not going to install properly. This is silly, because we have a working environment that we know is working on DockerHub already.
- in order to achieve reproducibility, I would have to pin all the packages in apt, requirements, etc on the github repository. I know that MyBinder allows to open previously built images via commit hashs, but what if I’d like to run the latest notebook changes on the latest computing environment that worked? Here again, building from a pinned, existing docker image would allow to change the analysis workflow (notebooks) while still keeping a frozen computing environment in the background.
- since we can build from a fully functional docker image, the installation process of Repo2Docker would be much quicker (I assume).
Isn’t this possible already?
I guess so: (minimal-dockerfile example). This example however does not show how to add the repository files in the image, i.e. I can’t apply it myself (or maybe I missed something). So possibly, this is requiring some documentation changes on the Binder side, or some help for people like me.
Thanks for making Jupyter, and thanks for your help on this!