Repo2docker/ as part of data/code publishing guidelines


Should Binder try and push to get itself included in pages that make recommendations to authors/researchers on how to archive their code?

Someone pointed me at and I thought we should start conversations about if/how we could get repo2docker/ included in that list.

Who could we reach out to start talking about this?


I think this is a great idea! I know Iain Hrynaszkiewicz ( and I think he would be a great person to chat with.

I also think Elizabeth DuPre ( is thinking about projects related to computational environment sharing. I’ll ping this to her via DM.

And also, one of the big messages of The Turing Way ( is to promote binder! So I’ll keep an eye here too and make sure we’re cross pollinating in a sensible way :partying_face:


I’ve also started a twitter thread tagging a bunch of funders/journals:

(Pinning here so I don’t lose it in the future!)


I think we should - and that this needs to be a specific push from our angle. @KirstieJane if y’all (or anybody else you know of) is interested in thinking specifically about Binder in the context of open reproducible and sharable workflows, perhaps it’d be worth a group brainstorm?


One thing I realised already:

For data we went through a phase of publishing your data openly where everyone just sent their spreadsheets or weirdly formatted data to some kind of archive. After that we now have FAIR and research data management plans. I think it is fair to say that the data that is being published with papers now is vastly more useful now.

Maybe this conversation should also focus on the “how to usefully publish your code and the environment it runs in” (like the FAIR guides for data)?

You can already publish your code to Zenodo (or other places like it) as an archive, however there doesn’t seem to be anything as clear and well known as the FAIR guides for data to do that. There is even less information on how to share the environment in a way that others can reproduce it.


As well drawing on best practice from data sharing principles, it might also be worth looking at how pre-existing tooling in this space sells itself and to what extent it would provide a pre-existing/practical way of implementing any mooted guidelines, as well as how it may fall short.

eg things like:

  • tools for rebuilding computational environments in general in docker or VMs: repo2docker/dockter/source2image, devops tooling (Ansible, Puppet, Vagrant);
  • tools for reproducing environments within a particular programming language environment: Python watermark package, or R’s packrat (and probably others…).