It seems to be conflating two separate things - notebooks / analytics scripts and packaging the environments required to run them.
conda
is the best tool we have for creating reproducible analytics environments and as you point out containers solve the “system software” issue.
[conda] is generally not very good at reproducing software environments at different points in time or space
In your linked tweet it sounds like the environment wasn’t saved with explicit specs. Whilst it’s appropriate to (optimistically) loosely pin your dependency versions in your meta.yaml
to ensure reproducibility of an environment you should export an explicit env-spec.txt
which exactly pins down the dependency versions, build numbers and even channels.
Creating a docker container with this environment ensures replicability and publishing the explicit env-spec.txt
allows the environment to be reproduced locally.
Our internal CI/CD automatically builds docker images and as part of that bakes in the env-spec.txt
for the environment so that it’s always available. In the case of web-app containers the env-spec.txt
is made available on a /api/env-spec
endpoint.
Packaging is complicated but that can be alleviated by automation exactly as is done with Binder. Other than not listing the explicit specs for the environment I’m not sure what reproducibility issues Binder doesn’t solve?
Last but not least, we still haven’t solved the core issue, which is that notebooks are not self-contained: they do not describe the dependencies they need.
I think this is where we disagree - I don’t think they should. IMHO that’s the job of a proper package manager, conda
and package specification DSL - meta.yaml