Apologies if this question has been asked before. I searched but didn’t come up with anything…
Consider a single notebook ipynb file, not necessarily part of any repo. Can this notebook tell me how to run itself in binder via notebook metadata?
Background
@choldgraf’s famous post:
opened my eyes to the possibilities of decoupling the binder image (which provides the execution environment) from the content, which can be pulled in via nbgitpuller. This concept now underlies Pangeo Gallery and binderbot. These tools launch notebooks in binders from the command line in an automated way. We generally need three pieces of data to specify the binder_image:
parameter | description |
---|---|
binder_url |
Url for binder service in which to run the notebooks. |
binder_repo |
Github repository which contains the repo2docker environment configuration. |
binder_ref |
Branch, tag, or commit within binder_repo which contains the binder environment configuration. |
We could eliminate the latter two if we could point directly to an appropriate docker image tag, e.g. via
Currently we embed these three parameters in an ad-hoc config file next to the notebooks. But what if we could embed them directly in the notebook metadata? Then we could have tools that could launch notebooks directly into the specified binder.
Proposal
The notebook json specification allows for adding arbitrary json metadata to the the notebook. All we need to do is standardize a convention for encoding this information. Here is one possibility:
{
"metadata" : {
"binder": {
"binder_url": "https://binder.pangeo.io",
"binder_repo": "pangeo-gallery/default-binder",
"binder_ref": "master"
}
}
}
There are many details to consider here, and such a convention would need iteration and community input. But at its core, it’s simple enough that it should be doable without too much fuss.
Benefits
Implementing something like this would help the notebook sharing ecosystem. Specifically, a JupyterHub deployment (particularly a cloud-based one that uses repo2docker to build the environments) could be made aware of the repo / ref that were used to generate its environment and use these to automatically populate binder_repo
and binder_ref
in all notebooks saved by its users. These notebooks could then be stored anywhere—a repo, a gist, dropbox, or any bespoke notebook storing solution accessible over http (e.g. @yuvipanda’s https://notebooksharing.space/). A simple tool could examine the notebook, get the parameters, and generate an appropriate nbgitpuller link to open the notebook in binder. This would allow us to move much more freely between hubs and binders, perhaps helping JupyterHub and BinderHub eventually converge—an idea already under discussion:
Downsides
Most users would probably just ignore this metadata with no downsides. Malformed or incorrect binder metadata would lead to non-functional binders. By hiding the environment details, users would potentially become more ignorant about the details of their environments. There are probably many others I haven’t thought of.