Dockerfile (with virtual environments) Best Practices

So I’ve been making my own customizations/building on the images in docker stacks to maintain some isolated/instant workspaces that I can rely on.

However, I’ve noticed as I am amassing more and more packages that I want to put aside, the dependency resolution is getting less and less predictable and now my cpu-based images have recently been failing during the build process. Now the obvious thing is to break them into seperate dockerfiles (also keeping the filesizes/redundancy down), and can easily build them when their tasks can clearly be mutually exclusive.

However beyond that, there are some instances where the line gets blurred workflow-wise where I could see some situations arising in which it’s extremely inconvenient to separate/switch between containers. I thought it could be a good solution to have some of the multipurpose docker images maintain separate kernels, that can help mitigate dependency hell, while reducing friction when switching between certain libraries.

It seems conda environments may not be the best choice to do this as common packages would have to be duplicated and contribute to bloat on the docker images in their redundancy. Is it possible to leverage virtual environments for this task? If so does virtual env and conda’s dependency handling play nice with each-other? It is my understanding that virtual envs will inherit python packages installed outside of them. If so will this relationship be maintained, when the virtual env is installed as a jupyter kernel? I’m hoping there’s a way to keep the images as light and as clean as possible. Has anyone else explored this?

What I’ve been doing is to skip conda altogether and install internal packages with pip. So that everything is installed at the system level. I have scripts at joequant/bitquant on github under directory bitstation.

Also I’ve migrated off of using docker to build images. I now use buildah to build the images and then push them to docker. Buildah allows you to use external tools to build the images, which means that you don’t have to put everything on the docker image.

1 Like

I’ll have to deep dive and figure out how you got that automation going sometime. I’ve already been using buildah, but not using it beyond simple docker capabilities.

I thought I was on to something after finding out conda environments try to use hard links instead of creating redundant files. Thought of a great workflow where jupyter stuff would be in the root, and dependencies would be directed to new cloneable environments. Then I learned that pip installs would not be cloned over:(