Should one be able to specify which base image tags are used to build pyspark-notebook containers?

I am hoping to build a stable data science platform to test tools, leveraging the jupyter docker-stacks. I am currently tweaking the jupyter/pyspark-notebook:58169ec3cfd3 to suit my needs. Unfortunately, it seems that the underlying scipy-notebook that is the base image for that Dockerfile has been updated, causing conflicts between some of my packages (pandas 0.25 seems to be the culprit) in recent builds.

I find myself wishing that the pyspark-notebook specified which scipy-notebook to pull from so that I could get a stable resulting container. Has anyone else daydreamed about this?

One possible con: it would require all images to be revisioned in lockstep to prevent inaccessible branches on earlier base images. I think the stability benefits are worth the trade-off though.

BASE_CONTAINER was added as a build time docker argument recently. You can specify a fully qualified Docker image name as the base for any of your builds.

The docker-stacks already build all of the images in lock step, so when jupyter/pyspark-notebook:58169ec3cfd3 builds on Docker Hub, the default base of jupyter/scipy-notebook:latest is just another name for jupyter/scipy-notebook:58169ec3cfd3.

1 Like