We have had “a ZIP file provider” on the roadmap for a while for repo2docker: https://github.com/jupyter/repo2docker/issues/812
This would be a good contribution to get started learning about how the content provider part of repo2docker works. I think we already have some ZIP file (or archive) handling in the Zenodo/Figshare providers that you can look at for inspiration.
I’d implement the caching based on the value of the ETag header that a server sends. This needs the server to cooperate a bit (aka send a etag header) but I think almost all webservers do that today. My idea would be to use the value of the etag as we use the resolved commit hash of a git repository. This means a ZIP file content provider would make a HEAD request to get the etag value and based on that decide if it needs to build or not.
I think a ZIP file fits very well with the Binder philosophy. While it all started with Git repositories on GitHub we now support lots of other content providers. In hindsight maybe repo2docker is doubly misnamed:
- it should be “directory-like-thing” instead of “repo”
- it should be “container” not "docker
Though I guess repo2docker is a bit more catchy than directory-like-thing2container
. For sure it is less to type.