Brainstorm Binder for research


#1

Over the months it seems like one way to split the potential use-cases for Binder are:

  1. For scholarship. E.g., as a way to highlight and share a final research / data science product.
  2. For research and collaboration. E.g., as a part of the research process as opposed to something that happens only at the end.

I’d love to hear the community’s thoughts on number 2. (to be clear, I think 1 is super important, but is perhaps more obvious than number 2)

How do folks envision Binder being useful within, for example:

  • Cycles of data analytics and research for an individual
  • Providing access to large-scale research tooling (e.g. HPC, cloud system) *above and beyond a JupyterHub)
  • Assisting with collaboration and sharing within research teams during the research process

Would love to hear ideas and thoughts. Particularly from some Pangeo folks like @rabernat or @jhamman.

Doing a bit of thinking around this will help the Binder community find opportunities for new development, and will also help us make a pitch to those who are interested in helping the project.


#2

I also see it as a way to share demos and quickly evaluate new packages, as well as providing support for ‘executable tutorials’ around packages.


#3

It can also provide serverless support, eg when powering thebelab/voila/juniper, that might be used in a variety of ways, eg tutorials, documentation (at least one of Googles APIs have playground activities provided in the form of Colab notebooks) and maybe down the line, outreach around research projects?


#4

@lheagy and I chatted a bit about this today, and she had some great ideas I wanted to share here too!

(this is a slight paragraphrase / re-org)

The general idea is that Binder can be useful in facilitating collaboration by letting you share quick, interactive insights into some data. Being able to interact immediately means you can focus on whatever the person wanted to show you, rather than worrying about code. In addition, working in a web-based environment means a person can build custom interactions (like web apps) in order to make a more rich experience. Finally, depending on the environment available to the binderhub, you can build UIs to complex machinery under the hood to let people do things quickly.

Facilitating interaction and technical pieces in a collaboration

  • Interdisciplinary work
    • In a lot of collaborations you have one or two “tech savy” people and a bunch of other people that are interested but can’t do a ton of complex stuff yourself
    • Collaborators may not understand the code at all but could interact with widgets
    • Set up notebooks where you don’t expect people to run the code, but you let them define a few parameters
    • “This removes the need for us to have a four hour meeting where I’m sitting writing code and you’re telling me whether this looks reasonable.”
  • Advisor / advisee collaboration
    • Advisors are often not interested in digging in to a bunch of code
    • Being able to send a link to an app-like thing could help facilitate them looking through your work
    • A common operation in research is “take a bunch of images, put them in a powerpoint, send to supervisor”
      • Often this is just tweaking one parameter and taking a snapshot
      • With interactivity you can send them, e.g., a Binder link with something more interesting.

Examples

  • Example: forward simulations

    • You need a cluster to run the simulation
    • What you get back is a mesh of electric field values
    • You can compute a ton of aspects of the physics from tehse electric field values
    • Can define ways to quickly view the different physics. So the widgets help you look through data products very quickly
    • “lets you switch your programmer brain off, and turn on your exploratory researcher brain”
  • Another example: geophysics apps for geologists

    • http://toolkit.geosci.xyz/content/apps.html
    • GeoToolkit has a collection of notebooks that display widgets that are hand-crafted for an interactive / binder-like experience
    • They actually take you through a more complex workflow (let you specify download parameters, control analysis workflows in the middle, etc)
    • Kind of like running their own data analytics service on top of Binder
    • Use it to help non-expert scientists in a collaboration to continue working together

#5

Building on @choldgraf’s comments, wrt interdisciplinary work. I opportunity to include widget and markdown mean that the “tech savy” type would be able to write the code as a workflow that a less-techy person could use in a later analysis.

e.g. I perform some analysis of an experiment using pymc3 and publish the experiments and share my analysis workflow. Someone a few years down the line might be performing a similar experiment and with a well-documented Notebook (and a shared BinderHub/JupyterHub infrastructure) they could modify my analysis workflow and apply it to their system. This would enable them to utilise pymc3 without having to learn how to use pymc3.

Essentially with the relevant infrastructure and community support, it would be possible to move toward the commoditisation of advanced modelling techniques (MCMC/ML/New Amazing Method That Funders Love Next Year).


#6

I love this idea - the second one is a key deliverable for The Turing Way project. Our idea is to share responsibility for reproducibility across the researcher (PhD student, postdoc, software engineer, the person actually writing & running the code) and their PI who almost certainly doesn’t install the packages etc. Our goal with building a private instance is to make it easy to send those links to works in progress :smiley_cat:

I’m super happy to chat more about this!


#7

I just want to +1million that this is usually what happens.

My motivation for using Binder is bringing the supervisor into checking that the work is reproducible by nudging them to want to see the real demonstration not a screengrab. If they get used to seeing a URL instead of a PPTX then the version control and the environment management is done for free :rocket::smiling_face_with_three_hearts:


#8

It also means the supervisor can make some basic plots themselves instead of having to email someone to then in words describe what they should plot etc :art: