I am hoping to build a stable data science platform to test tools, leveraging the jupyter docker-stacks. I am currently tweaking the jupyter/pyspark-notebook:58169ec3cfd3 to suit my needs. Unfortunately, it seems that the underlying scipy-notebook that is the base image for that Dockerfile has been updated, causing conflicts between some of my packages (pandas 0.25 seems to be the culprit) in recent builds.
I find myself wishing that the pyspark-notebook specified which scipy-notebook to pull from so that I could get a stable resulting container. Has anyone else daydreamed about this?
One possible con: it would require all images to be revisioned in lockstep to prevent inaccessible branches on earlier base images. I think the stability benefits are worth the trade-off though.