How can I prevent an out of memory error in Github Actions?

SamKrasnoff · December 20, 2022, 10:16pm

'm working on a project that uses the repo2docker Github Actions workflow: GitHub - jupyterhub/repo2docker-action: A GitHub action to build data science environment images with repo2docker and push them to registries.

However, the required packages have grown and on mamba env update, the runner crashes, due to what I believe is an out of memory error. Image shown below:

Does anyone have suggestions for workarounds, shortcuts, etc? Thanks!

I’ve looked into the action itself, and registered for a free enterprise trial to get larger runners as a temporary workaround, but I’m looking for a long term fix.

bollwyvl · December 21, 2022, 2:33am

Doing the solves off-line with conda-lock will usually solve OoM during environment creation, but won’t cache very well with r2d.

SamKrasnoff · December 22, 2022, 5:45pm

What do you mean by won’t cache very well? WIll work on generating a conda-lock for the environment and seeing if that helps.

bollwyvl · December 22, 2022, 6:15pm

Repo2docker has special treatment for (.binder/)environment.yml, but doesn’t understand either of the conda-lock output files, partially as conda-lock hasn’t declared a “well-known” file.

When it finds one of the “well-known” files, it does a “smart” install:

copies just that file into the building container
runs the package manager with some flags
cleans up the cache after the install

This layer then gets cached, and the rest of the process continues. If you don’t change your environment, and land on a repo2docker host that already has a previous image, you don’t have to rebuild.

anyhow, to use a conda-lock against stock repo2docker, you have to:

create a file called something other than environment.yml (otherwise r2d will find it)
create the lock, maybe with a file called .binder/create-conda-lock.sh

#!/usr/bin/env bash
# .binder/create-conda-lock.sh
set -eux
cd .binder 
conda-lock \
  --mamba \
  --kind explicit \
  --platform linux-64 \
  --file not-environment.yml

make a script that can create the lock (won’t be useful locally)

#!/usr/bin/env bash
# .binder/update-env-from-lock.sh
set -eux
mamba create \
  --prefix ${NB_PYTHON_PREFIX} \
  --file .binder/conda-linux-64.lock
# ...and if needed
pip install --no-deps -r .binder/not-requirements.txt
# or other things conda-lock can't do

use it in .binder/postBuild

#!/usr/bin/env bash
# .binder/postBuild
set -eux
bash .binder/update-env-from-lock.sh

check in all of these files!

bollwyvl · December 22, 2022, 6:25pm

Another avenue to explore is conda-pack: this is slightly different as it takes an entire conda environment (even pip- or npm install -g packages).

This would be a more complex CI approach, where the conda-pack would be built out of band (on a full linux vm), then fetched during postBuild inside repo2docker. But this has some of the same shortcomings, as there’s no way for (repo2)docker to cache layers random shell inputs.

On the up-side: conda-pack archives are portable to any linux-64 machine, so they have value outside of docker.

Topic		Replies	Views
Previously working build fails with environment.yml packages not in env Binder	4	705	October 16, 2020
Out of Memory Error in Running Build JupyterLab help-wanted	3	2459	April 26, 2021
How to make minikube memory issues more obvious? BinderHub	3	779	January 21, 2020
Repo2docker base image for empty repository? discuss repo2docker	0	672	July 15, 2021
Error when building a new image from jupyterhub/singleuser:3.0.0 JupyterHub jupyterlab , jupyterhub , docker	3	579	October 17, 2022

How can I prevent an out of memory error in Github Actions?

Related topics