What do people use for adding packages to containers? Conda or just plain pip? I’m planning a migration from vm based cluster to k8s. I use conda extensively for adding packages by user requests. Current setup have 65 python modules and appr 110 R packages in my ansible playbooks all installed with conda (eg. mamba). Any advice on how to minimize build times for containers?
For build time in particular, especially if you are using conda, lock files can help. Conda installs are pretty quick if you use explicit environment files, because there is no solve, it’s just a list of URLs to download and extract. These can be passed to micromamba to bootstrap an environment for a docker image pretty quickly, and it’s my go-to for maintaining docker images.
I have a relatively complex real-world example here, with key points:
Another build-time tip is to mount caches, instead of disabling them, as is the common practice for keeping images small. Caches are great, and reducing rebuild time is the point of them.
ENV MAMBA_ROOT_PREFIX=/tmp/conda
RUN --mount=type=cache,target=$MAMBA_ROOT_PREFIX micromamba create -p /opt/conda -f /tmp/conda.lock
or for pip:
ENV PIP_CACHE_DIR=/tmp/pip-cache
RUN --mount=type=cache,target=$PIP_CACHE_DIR pip install -r /tmp/requirements.txt
With lockfiles in place and caches mounted, builds after the first, even if the docker cache is invalidated, amount to only extracting packages, as the solve and download steps (the two most expensive steps) are both skipped.