Jovian.ml increased usage in Binder

choldgraf · October 2, 2020, 4:54pm

Hey @aakashns and @siddhant - thanks for reaching out and I hope we can work out something that works for both of our communities

A few responses below:

<3

This is great - one of the goals of the JupyterHub/Binder projects is to make computation more accessible to people outside of the north america/europe continents!

Us too! Our challenge is that there’s no “scaling model” for mybinder.org as it’s “just” a technical demo. So our challenge here is not one of different visions / hopes, we just have a reality that Binder has no dedicated resources right now so have to be careful about too much usage.

Thanks for coming up with some tangible ideas - we appreciate you putting in the effort to think through this and reach out.

Check out some of the ideas in this post: Tip: speed up Binder launches by pulling github content in a Binder link with nbgitpuller perhaps it would help you reduce the number of builds for repositories

We’ve discussed this before - feel free to add your thoughts etc!

github.com/jupyterhub/repo2docker

Add a JupyterNotebook buildpack and contentprovider

opened 08:41PM - 04 Dec 19 UTC

choldgraf

needs: discussion

This would be a major extension to how repo2docker works, so I don't think it ne…eds to happen anytime soon, but is worth discussing. I've had a number of conversations now where people suggest it'd be good to have the *entire environment* encapsulated in a single Jupyter Notebook. E.g., rather than sharing a repository of files, they'd just share a single file with all of the information needed in it. This could be done if we implemented a `JupyterNotebookBuildPack`. I imagine that it could do something like: 1. `detect()` if the input path were a single file that ends in `.ipynb` and that has a notebook-level metadata field (e.g., `binder/` or `environment/`). 2. Within that metadata field would be another dictionary, where the keys are the full filenames of repo2docker configuration files (e.g., `requirements.txt`, `REQUIRE`) and the value of each key is a list of lines. The BuildPack then runs a second round of `detect()` using the other BuildPacks, and assembles an environment following whatever it finds. Something like: ```yaml env: requirements.txt: - numpy - matplotib runtime.txt: - r-YYYY-MM-DD ``` Could this be implemented without much added complexity? *Should* this be implemented at all?

We’ve discussed this as well and came up with (for now) a recommended way to do this with pre-existing tools rather than changing BinderHub, see:

github.com/jupyterhub/repo2docker

Add the ability to specify a different repository for the environment of a repo

opened 07:03PM - 24 Jun 20 UTC

closed 12:04PM - 26 Jun 20 UTC

choldgraf

needs: discussion

Over the years, we've felt a tension between *flexibility* and *speed* in Binder… launches. This is most-obvious in repositories that are often updated in their *content*, but not in their *environment*. We've recommended various workarounds for this (e.g., [using nbgitpuller to separate content from environment](https://discourse.jupyter.org/t/tip-speed-up-binder-launches-by-pulling-github-content-in-a-binder-link-with-nbgitpuller/922)), but many folks spend a lot of extra time waiting for a binder session to launch just because they've changed a typo in a notebook somewhere. I think one way that we could get around this could be to allow for users to specify an **environment repository** in their code. This could behave like this: in `runtime.txt`: ``` environment-<URL to git repository> ``` which would trigger the following behavior: 1. All other configuration files in the current repository are ignored 2. repo2docker is called on the repo specified in the `runtime.txt` file 3. When the session begins, all of the files in the *environment repo* are removed, and replaced by the ones in the current repo In this way, people could explicitly tag a different repository as an *environment repository* and thus save a lot of time in re-building etc. They could pin the URL of the target repository to a specific hash/branch/etc just like a normal binder repo, so best-practices in reproducibility will still function. This could: * Save our cloud costs, because fewer unique images would end up being built * Save launch times, because fewer unique images == less docker pulls and repo2docker builds == less launch time * Be a way to support a "default community image" that many people can use, which would result in *much* faster launch times (e.g., just tell people "put `environment-https://github.com/jupyterhub/community-environment` in your `runtime.txt` file) What do people think about this?

I believe you can authenticate a BinderHub if you roll your own, the mybinder.org service is meant as a public demo and service and for this reason we don’t do user auth

There is some ability to do this already. For example I believe the library that the SpaCy docs use has the ability to cache a binder session for use on another page: GitHub - ines/juniper: 🍇 Edit and execute code snippets in the browser using Jupyter kernels

I’d love to see this support added to Thebe as well

http://thebe.readthedocs.org/

This is a good idea, perhaps you’d be willing to open up an issue in the documentation repository to suggest the information you’d like to see so we can track the issue?

Also just a final note - I appreciate all of these suggestions about new development, but again please keep in mind that nobody is paid to work on Binder dev, we are just a community of volunteers, and I welcome your contributions to discuss and tackle some of these issues as well!

Topic		Replies	Views
Would a "The Littlest Binder" be useful? Binder	36	5487	August 30, 2021
How to reduce mybinder.org repository startup time discuss	60	42291	December 1, 2022
GitHub Actions + Binder Binder community , how-to	7	2344	November 22, 2019
BinderHub for HPC BinderHub	21	3455	August 26, 2020
Embed binder-related metadata in notebook? Binder	8	1336	August 11, 2021

Jovian.ml increased usage in Binder

Related topics