Conda / Pip, User storage and Docker environments

A request for help

I hope to get some insights from you regarding a situation common for JupyterHub’s utilizing Docker images.

Usage situation

A JupyterHub starts docker containers with an image having conda packages preinstalled in /opt/conda, owned by the user. The JupyterHub mounts user storage in /home/jovyan in this docker container. Everything outside this user storage will be reset on container restarts, so packages installed to /opt/conda needs to be installed again. The same scenario arise for pip but can be resolved quite neatly with pip install --user <package> that allows the user to only store the additional package in the users personal storage.

Goals

End user installation experience

  • (Awesome) Make the end user not worry about this, making installed packages persist without any additional flags to conda install and pip install.
  • (Nice) Make the end user need to know they if need to install packages that will persist in between container restarts, they simply need to use a single flag or similar in the install statement.
  • (Unacceptable) Require the user to create new environments etc, or duplicated installed packages.

Other

  • (Unacceptable) User storage is filled up with lots of duplicated installations of what is already available in /opt/conda, perhaps because it ignored it.

Solution ideas

  • If I can create a new environment that augments /opt/conda then I’ll be fine, but how to do that? I’ve heard about nested aka stacked environments, but I never got that to work as expected, but perhaps it is the way to go… Hmm…

Perhaps relevant stuff

The paths inspected by python currently

$ python -c "import sys; print('\n'.join(sys.path))"

/opt/conda/lib/python36.zip
/opt/conda/lib/python3.6
/opt/conda/lib/python3.6/lib-dynload
/home/jovyan/.local/lib/python3.6/site-packages
/opt/conda/lib/python3.6/site-packages
1 Like

Looks like there’s a hidden config setting for stacking conda environments: https://github.com/conda/conda/pull/5159

1 Like

Maybe more of a work around than a real solution: can you configure default options/flags for pip so that pip install --user is the default even though users type pip install?

1 Like

Thanks for your input @manics and @betatim!

I have failed to utilize stacked environments but attempted for some hours ^^. I’ll look into if you can change the default behavior of pip, I don’t want to setup bash aliases though I think, seems a bit too messy but perhaps plausible.

These things get complicated. I don’t believe conda stacked environments are the answer because two stacked environments are still fully independent envs, they are just both on $PATH. That means if you conda install some-python-package into the inner env, it’s still going to install its dependencies, which include Python itself, etc.

Given that conda doesn’t allow conda packages in one environment to be installed in multiple prefixes, I think the best way to do this is to say that only packages installed by the user with pip are persisted by default:

  1. conda installs either need to be re-done on restart or create conda envs in home to be persisted
  2. recommend pip install when packages are available from pip

You can make pip install --user the default with:

[install]
user = true

in /etc/pip.conf or ~/.config/pip/pip.conf

Bonus:

Python has a super handy shortcut for showing site/path info:

$ python -m site
sys.path = [
    '/',
    '/opt/conda/lib/python37.zip',
    '/opt/conda/lib/python3.7',
    '/opt/conda/lib/python3.7/lib-dynload',
    '/opt/conda/lib/python3.7/site-packages',
]
USER_BASE: '/root/.local' (doesn't exist)
USER_SITE: '/root/.local/lib/python3.7/site-packages' (doesn't exist)
ENABLE_USER_SITE: True
4 Likes

Wow @minrk thank you so much for this summarized insights, they are very useful to me!

Here’s my experiment with requiring additional (to a pre-installed base) packages directly from a notebook, thus documenting its dependencies.