Hey @aakashns and @siddhant - thanks for reaching out and I hope we can work out something that works for both of our communities
A few responses below:
<3
aakashns:
in India, South-East Asia, Africa, South America, the Middle East (and of course Europe and North America)
This is great - one of the goals of the JupyterHub/Binder projects is to make computation more accessible to people outside of the north america/europe continents!
Us too! Our challenge is that there’s no “scaling model” for mybinder.org as it’s “just” a technical demo. So our challenge here is not one of different visions / hopes, we just have a reality that Binder has no dedicated resources right now so have to be careful about too much usage.
Thanks for coming up with some tangible ideas - we appreciate you putting in the effort to think through this and reach out.
aakashns:
Decouple the environment Docker image from the source code files, so that a new build does not have to be triggered if there’s already a built image for an environment.yml
file.
Check out some of the ideas in this post: Tip: speed up Binder launches by pulling github content in a Binder link with nbgitpuller perhaps it would help you reduce the number of builds for repositories
aakashns:
Support for running a single Jupyter notebook file, apart from full Git repositories. Most people who are new to the domain of data science aren’t familiar with Git and are just looking to run a notebook.
We’ve discussed this before - feel free to add your thoughts etc!
opened 08:41PM - 04 Dec 19 UTC
needs: discussion
This would be a major extension to how repo2docker works, so I don't think it ne… eds to happen anytime soon, but is worth discussing.
I've had a number of conversations now where people suggest it'd be good to have the *entire environment* encapsulated in a single Jupyter Notebook. E.g., rather than sharing a repository of files, they'd just share a single file with all of the information needed in it.
This could be done if we implemented a `JupyterNotebookBuildPack`. I imagine that it could do something like:
1. `detect()` if the input path were a single file that ends in `.ipynb` and that has a notebook-level metadata field (e.g., `binder/` or `environment/`).
2. Within that metadata field would be another dictionary, where the keys are the full filenames of repo2docker configuration files (e.g., `requirements.txt`, `REQUIRE`) and the value of each key is a list of lines. The BuildPack then runs a second round of `detect()` using the other BuildPacks, and assembles an environment following whatever it finds.
Something like:
```yaml
env:
requirements.txt:
- numpy
- matplotib
runtime.txt:
- r-YYYY-MM-DD
```
Could this be implemented without much added complexity? *Should* this be implemented at all?
aakashns:
Decouple the environment Docker image from the source code files, so that a new build does not have to be triggered if there’s already a built image for an environment.yml
file.
We’ve discussed this as well and came up with (for now) a recommended way to do this with pre-existing tools rather than changing BinderHub, see:
opened 07:03PM - 24 Jun 20 UTC
closed 12:04PM - 26 Jun 20 UTC
needs: discussion
Over the years, we've felt a tension between *flexibility* and *speed* in Binder… launches. This is most-obvious in repositories that are often updated in their *content*, but not in their *environment*. We've recommended various workarounds for this (e.g., [using nbgitpuller to separate content from environment](https://discourse.jupyter.org/t/tip-speed-up-binder-launches-by-pulling-github-content-in-a-binder-link-with-nbgitpuller/922)), but many folks spend a lot of extra time waiting for a binder session to launch just because they've changed a typo in a notebook somewhere.
I think one way that we could get around this could be to allow for users to specify an **environment repository** in their code. This could behave like this:
in `runtime.txt`:
```
environment-<URL to git repository>
```
which would trigger the following behavior:
1. All other configuration files in the current repository are ignored
2. repo2docker is called on the repo specified in the `runtime.txt` file
3. When the session begins, all of the files in the *environment repo* are removed, and replaced by the ones in the current repo
In this way, people could explicitly tag a different repository as an *environment repository* and thus save a lot of time in re-building etc. They could pin the URL of the target repository to a specific hash/branch/etc just like a normal binder repo, so best-practices in reproducibility will still function.
This could:
* Save our cloud costs, because fewer unique images would end up being built
* Save launch times, because fewer unique images == less docker pulls and repo2docker builds == less launch time
* Be a way to support a "default community image" that many people can use, which would result in *much* faster launch times (e.g., just tell people "put `environment-https://github.com/jupyterhub/community-environment` in your `runtime.txt` file)
What do people think about this?
aakashns:
Support some way of user authentication - not only for privacy reasons but also for rate-limiting. Right now there doesn’t seem to be an easy way of preventing one user launching 100 instances on Binder.
I believe you can authenticate a BinderHub if you roll your own, the mybinder.org service is meant as a public demo and service and for this reason we don’t do user auth
aakashns:
Provide an API for programmatically launching, monitoring and shutting down instances on Binder. We are currently creating Git repositories and redirecting users to MyBinder URLs constructed on our backend to support the “Run on Binder” functionality.
There is some ability to do this already. For example I believe the library that the SpaCy docs use has the ability to cache a binder session for use on another page: GitHub - ines/juniper: 🍇 Edit and execute code snippets in the browser using Jupyter kernels
I’d love to see this support added to Thebe as well
http://thebe.readthedocs.org/
aakashns:
Add some documentation regarding the capacity MyBinder supports and best practices for using the service for online courses etc. so companies/institutions can make a more informed choice while picking an execution platform and avoid causing disruptions to MyBinder.
This is a good idea, perhaps you’d be willing to open up an issue in the documentation repository to suggest the information you’d like to see so we can track the issue?
Also just a final note - I appreciate all of these suggestions about new development, but again please keep in mind that nobody is paid to work on Binder dev, we are just a community of volunteers, and I welcome your contributions to discuss and tackle some of these issues as well!