I hope to get some insights from you regarding a situation common for JupyterHub’s utilizing Docker images.
Usage situation
A JupyterHub starts docker containers with an image having conda packages preinstalled in /opt/conda, owned by the user. The JupyterHub mounts user storage in /home/jovyan in this docker container. Everything outside this user storage will be reset on container restarts, so packages installed to /opt/conda needs to be installed again. The same scenario arise for pip but can be resolved quite neatly with pip install --user <package> that allows the user to only store the additional package in the users personal storage.
Goals
End user installation experience
(Awesome) Make the end user not worry about this, making installed packages persist without any additional flags to conda install and pip install.
(Nice) Make the end user need to know they if need to install packages that will persist in between container restarts, they simply need to use a single flag or similar in the install statement.
(Unacceptable) Require the user to create new environments etc, or duplicated installed packages.
Other
(Unacceptable) User storage is filled up with lots of duplicated installations of what is already available in /opt/conda, perhaps because it ignored it.
Solution ideas
If I can create a new environment that augments /opt/conda then I’ll be fine, but how to do that? I’ve heard about nested aka stacked environments, but I never got that to work as expected, but perhaps it is the way to go… Hmm…
Maybe more of a work around than a real solution: can you configure default options/flags for pip so that pip install --user is the default even though users type pip install?
I have failed to utilize stacked environments but attempted for some hours ^^. I’ll look into if you can change the default behavior of pip, I don’t want to setup bash aliases though I think, seems a bit too messy but perhaps plausible.
These things get complicated. I don’t believe conda stacked environments are the answer because two stacked environments are still fully independent envs, they are just both on $PATH. That means if you conda install some-python-package into the inner env, it’s still going to install its dependencies, which include Python itself, etc.
Given that conda doesn’t allow conda packages in one environment to be installed in multiple prefixes, I think the best way to do this is to say that only packages installed by the user with pip are persisted by default:
conda installs either need to be re-done on restart or create conda envs in home to be persisted
recommend pip install when packages are available from pip
You can make pip install --user the default with:
[install]
user = true
in /etc/pip.conf or ~/.config/pip/pip.conf
Bonus:
Python has a super handy shortcut for showing site/path info: