I like the idea of having a “binderise it” sprint. Should we create a new topic to discuss that? I am thinking we could copy some of the ideas from https://foundation.mozilla.org/en/opportunity/global-sprint/ and run it as a virtual sprint (without local sites to keep admin overhead low).
I think there are several useful things that sprints could achieve particularly for repos built around a particular package:
- binderise it: just get the packages / environment built to run notebooks in a repo;
- getting_it_started: produce or improve a demo notebook showing how to get started using the package (or notebook extension, or notebook magic, etc) and demoing in particular any things that exploit the notebook context (eg previewing images, tables, state);
- _repr_it: suggesting or contributing rich output / repr views to a package to exploit the notebook machinery more in the context of the package;
- magic_it: suggesting or creating magics around the package that make it easier for a novice to use in a notebook context
- plugin_it: suggest / explore / contribute JupyterLab plugins/extensions that allow the package to make use of JupyterLab affordances.
How much lead time do people think this would need? My guess is several months?
We should probably approach some repository maintainers to ask if they’d be happy to take part in the sprint and/or have others stop by their repo and binderise it.
ps. with my new found admin rights I used the “split messages into new topic” function to create this thread. Let’s see how it goes.
I really like your idea of breaking down different kinds of things one could do at a sprint @psychemedia … maybe a first step would actually be to build a “Binder buildathon template” that we can use as guidance for how to structure these events? Then we’d have a starting point once we actually start planning the event
@choldgraf Seems like a sensible plan… where should such a template go? Would it make sense to use a wiki page somewhere to spin some ideas round? Or better to flesh out some ideas here first where it may be easier for folk to contribute ideas in?
I wonder if it would be easier to just do it and then extract a template from a few instances.
Either work for me, I was just thinking that we may want an opportunity to brainstorm / write down thoughts without bottlenecking on hosting an actual event, since I’d guess that won’t happen for another couple months at the least
This is the template I’ve used for three workshops now: https://github.com/Build-a-binder/build-a-binder.github.io/blob/03e16b82dbfdcc4d0321c25d06d6df113c7ea9b2/workshop/10-zero-to-binder.md
Getting people started with a nearly empty repo and clicking launch means they get the hang quickly and there is only a small amount of time where people have to listen instead of doing things themselves. By the time you’ve shown them how to use
requirements.txt and the repo2docker documentation about all the other formats people are comfortable with all the moving parts and tend to have questions specific to their use-case or about limits/funding/reliability of mybinder.org. As a result the agenda looks a bit like:
- short talk about Binder with a demo
- from zero to binder!
- questions, questions, questions
- work on your own binder with roaming helpers
For a global sprint style event I’d run the “Zero to Binder” session every few hours as a video call and have a chat channel for the Q&A part where people are always hanging out. As well as posting a link to intro material.
I really like that idea — walk through all the pieces on a near empty dry run, which everyone can do together, then set to your own example:-)
Looking through various repos, notebooks appear in them for different reasons. Eg there are tutorial or teaching style repos where notebooks are the content, and package repos where a notebook may have been used to support development, or provide examples. (A sub-class of these package repos are things like repos for Jupyter magics.)
Binderising tutorial repos in the first instance just needs apt/requirements/environment files etc (would practical / pragmatic advice on whether to use pip requirements or conda environment be useful? I tend to prefer apt+requirements unless I find a build is tricky, or I don’t necessary have certain permissions to install apt packages, whence conda can help).
Something I keep discussing with colleagues in an educational context, and which is likely to apply in a business context, is the extent to which the notebook environment is provided as customised environment:
- preinstalled / pre-enabled notebook extensions with notebooks written to exploit those extensions (although they should degrade gracefully if extensions are not available);
- code that is automatically loaded in and executed when a notebook is started.
One of the issues we have is how we communicate these requirements to a student who may want to run the notebooks in their own environment — how do they set up the environment to match the one we provide (or is our environment a value add?). Do we use lots of boilerplate in a notebook to make it explicit? Would a standardised (or at least convention based) nbenvt.sh script make sense? (I tend to think anything ending .sh can put off a lot of novice users…)
** Code Repos**
For code repos, I think Binderising offers the opportunity to encourage people to make packages that exploit notebooks / Juptyerlab and provides different benefits. Setting up the package environment, providing getting started notes, then encouraging notebook value add features (magics, _repr_html etc). These are clear steps and zeroTo examples could be provided for each of them that:
- describe benefits / rationale for the step
- suggest conventional filenames / directory structures (helps habit forming, makes navigation easier)
- simple steps / recommendations for how to go about each step (eg when Binderising Python notebooks, look for every import to identify packages); (this makes me thinks: a) would tooling to help suggest requirements files be useful; then b) if such tooling were available, wouldn’t it make sense to run it in the background anyway? I’m sure I’ve seen discussion about this previously but can’t find it offhand? It may be hard to get it to work 100% all (any?!) of the time, but it could provide helpful nudges.)
Adopting a conventional directory structure for where to put getting started notebooks would be a small, but important, step (and maybe even suggestions about naming them; this supports ease of navigation but also sets / frames expectations about the content). In a sense, we’d be pushing at a convention for Binderised documentation, but rather than mandate it, encourage it through an easy to follow Binderise-it recipe.
One thing that is an issue in some notebooks is the requirement to set API keys.
Solutions like https://github.com/dylburger/reading-api-key-from-file are a bit of a hack, and the https://github.com/lean-data-science/jupyterlab-credentialstore looks interesting, but I wonder if it’d be useful if notebooks supported a simple API keypair store natively that would allow notebooks to reference values set (secretly) in the store. (Novices can find it hard to set environment vars; if they can be set secretly in a notebook environment, that would be really useful.) A default behaviour might be to treat conventionally named items in a particular way. eg NBENV_myvar forces a lookup of a notebook managed keypair, if that fails, check an environment var (myvar), if that fails, prompt the user).