Multi-stage Dockerfile Build and copying Conda environment

Currently, I am building these images myself with some slight modifications. I build the following:
base-notebook
minimal-notebook
sci-py notebook
tensorflow notebook

I have been doing some experimenting with a multi-stage Docker build process, copying folders from 1 Docker image into the next. Let’s say I have the tensorflow Docker image built as desired. Is it possible to create a new Docker Image from a base Ubuntu image and copy the entire /opt/conda directory from the tensorflow notebook into this new image? Will it work? The reason I ask is because in experimenting, this saves over 1GB of image space. I understand this doesn’t address some things that I would have to resolve separately in the base-notebook and minimal-notebook builds (e.g. apt install).

Appreciate any thoughts.

1 Like

Take a look at github:joequant/bitquant bitstation to see what I’m doing.

This can work, but there can be a lot of gotchas. The big gotcha is that a lot of times things will depend on file permissions, owners, and timestamps. What I’ve done that works is to build an image, and use docker exec to go into the image and look at the log files when something goes wrong. Also pay attention to error messages.

Once you realize that the broken issues are due to permissions, owners, and timestamps, it becomes possible to track down these issues. Also I use pip instead of conda to install things at the system level, and bypass virtual environments. Conda is useful when you are a user setting up a system in multiuser platform, but when you are creating a docker image, you are the God-Emperor of the container, so you an just put everything at the system level.

Also you might want to look at the buildah toolchain. Buildah runs the container build external to the container, which means that you can pull in system tools.

One final thing is that if you are running a long container build, you might take a look at some “tricks” that I have. For example, I make heavy use of ipyparallel to parallelize the container build and also use a lot of network/compile caching.

1 Like