Conda / Pip, User storage and Docker environments

consideRatio · January 3, 2019, 12:32pm

A request for help

I hope to get some insights from you regarding a situation common for JupyterHub’s utilizing Docker images.

Usage situation

A JupyterHub starts docker containers with an image having conda packages preinstalled in /opt/conda, owned by the user. The JupyterHub mounts user storage in /home/jovyan in this docker container. Everything outside this user storage will be reset on container restarts, so packages installed to /opt/conda needs to be installed again. The same scenario arise for pip but can be resolved quite neatly with pip install --user <package> that allows the user to only store the additional package in the users personal storage.

Goals

End user installation experience

(Awesome) Make the end user not worry about this, making installed packages persist without any additional flags to conda install and pip install.
(Nice) Make the end user need to know they if need to install packages that will persist in between container restarts, they simply need to use a single flag or similar in the install statement.
(Unacceptable) Require the user to create new environments etc, or duplicated installed packages.

Other

(Unacceptable) User storage is filled up with lots of duplicated installations of what is already available in /opt/conda, perhaps because it ignored it.

Solution ideas

If I can create a new environment that augments /opt/conda then I’ll be fine, but how to do that? I’ve heard about nested aka stacked environments, but I never got that to work as expected, but perhaps it is the way to go… Hmm…

Perhaps relevant stuff

The paths inspected by python currently

$ python -c "import sys; print('\n'.join(sys.path))"

/opt/conda/lib/python36.zip
/opt/conda/lib/python3.6
/opt/conda/lib/python3.6/lib-dynload
/home/jovyan/.local/lib/python3.6/site-packages
/opt/conda/lib/python3.6/site-packages

manics · January 4, 2019, 11:16am

Looks like there’s a hidden config setting for stacking conda environments: https://github.com/conda/conda/pull/5159

betatim · January 4, 2019, 1:08pm

Maybe more of a work around than a real solution: can you configure default options/flags for pip so that pip install --user is the default even though users type pip install?

consideRatio · January 4, 2019, 6:04pm

Thanks for your input @manics and @betatim!

I have failed to utilize stacked environments but attempted for some hours ^^. I’ll look into if you can change the default behavior of pip, I don’t want to setup bash aliases though I think, seems a bit too messy but perhaps plausible.

minrk · January 10, 2019, 9:54am

These things get complicated. I don’t believe conda stacked environments are the answer because two stacked environments are still fully independent envs, they are just both on $PATH. That means if you conda install some-python-package into the inner env, it’s still going to install its dependencies, which include Python itself, etc.

Given that conda doesn’t allow conda packages in one environment to be installed in multiple prefixes, I think the best way to do this is to say that only packages installed by the user with pip are persisted by default:

conda installs either need to be re-done on restart or create conda envs in home to be persisted
recommend pip install when packages are available from pip

You can make pip install --user the default with:

[install]
user = true

in /etc/pip.conf or ~/.config/pip/pip.conf

Bonus:

Python has a super handy shortcut for showing site/path info:

$ python -m site
sys.path = [
    '/',
    '/opt/conda/lib/python37.zip',
    '/opt/conda/lib/python3.7',
    '/opt/conda/lib/python3.7/lib-dynload',
    '/opt/conda/lib/python3.7/site-packages',
]
USER_BASE: '/root/.local' (doesn't exist)
USER_SITE: '/root/.local/lib/python3.7/site-packages' (doesn't exist)
ENABLE_USER_SITE: True

consideRatio · January 14, 2019, 8:27am

Wow @minrk thank you so much for this summarized insights, they are very useful to me!

jhermann · March 19, 2019, 9:22pm

Here’s my experiment with requiring additional (to a pre-installed base) packages directly from a notebook, thus documenting its dependencies.

Topic		Replies	Views
Why /opt/conda is user writable in docker stacks? Zero to JupyterHub on Kubernetes	1	341	August 9, 2023
[zero-to-jupyterhub] Create isolated and editable environments for each user in advance Zero to JupyterHub on Kubernetes jupyterhub	14	1313	September 22, 2021
What is the jupyterhub-singleuser conda package for? JupyterHub	2	992	August 17, 2022
Setting up JupyterHub as root (via pip/npm) so each system user can access their own conda environments (/home/$USER/.conda/envs)? JupyterHub jupyterhub , how-to , help-wanted	0	302	May 9, 2022
New to JupyterHub - Queries JupyterHub community , jupyterhub , how-to	3	961	July 6, 2020