'm working on a project that uses the repo2docker Github Actions workflow: GitHub - jupyterhub/repo2docker-action: A GitHub action to build data science environment images with repo2docker and push them to registries.
However, the required packages have grown and on mamba env update, the runner crashes, due to what I believe is an out of memory error. Image shown below:
Does anyone have suggestions for workarounds, shortcuts, etc? Thanks!
I’ve looked into the action itself, and registered for a free enterprise trial to get larger runners as a temporary workaround, but I’m looking for a long term fix.
Doing the solves off-line with conda-lock
will usually solve OoM during environment creation, but won’t cache very well with r2d.
1 Like
What do you mean by won’t cache very well? WIll work on generating a conda-lock for the environment and seeing if that helps.
Repo2docker has special treatment for (.binder/)environment.yml
, but doesn’t understand either of the conda-lock
output files, partially as conda-lock
hasn’t declared a “well-known” file.
When it finds one of the “well-known” files, it does a “smart” install:
- copies just that file into the building container
- runs the package manager with some flags
- cleans up the cache after the install
This layer then gets cached, and the rest of the process continues. If you don’t change your environment, and land on a repo2docker host that already has a previous image, you don’t have to rebuild.
anyhow, to use a conda-lock
against stock repo2docker
, you have to:
- create a file called something other than
environment.yml
(otherwise r2d will find it)
- create the lock, maybe with a file called
.binder/create-conda-lock.sh
#!/usr/bin/env bash
# .binder/create-conda-lock.sh
set -eux
cd .binder
conda-lock \
--mamba \
--kind explicit \
--platform linux-64 \
--file not-environment.yml
- make a script that can create the lock (won’t be useful locally)
#!/usr/bin/env bash
# .binder/update-env-from-lock.sh
set -eux
mamba create \
--prefix ${NB_PYTHON_PREFIX} \
--file .binder/conda-linux-64.lock
# ...and if needed
pip install --no-deps -r .binder/not-requirements.txt
# or other things conda-lock can't do
- use it in
.binder/postBuild
#!/usr/bin/env bash
# .binder/postBuild
set -eux
bash .binder/update-env-from-lock.sh
- check in all of these files!
1 Like
Another avenue to explore is conda-pack: this is slightly different as it takes an entire conda
environment (even pip
- or npm install -g
packages).
This would be a more complex CI approach, where the conda-pack would be built out of band (on a full linux vm), then fetched during postBuild
inside repo2docker
. But this has some of the same shortcomings, as there’s no way for (repo2
)docker
to cache layers random shell inputs.
On the up-side: conda-pack
archives are portable to any linux-64
machine, so they have value outside of docker.