Repo2docker roadmap review

In https://repo2docker.readthedocs.io/en/latest/contributing/roadmap.html we promised to review the roadmap by the start of December. That is now. If you want to be part of that please get involved in this thread.

We have never done this before so this is an experiment. :slight_smile: Given this is new let’s try and go for “smooth, non radical changes” for this iteration.

As you can see the roadmap is in the style of “yes and” (things not on it are still welcome!), it tries to only list things that someone actually plans on working on in the near term. In that spirit, if there are things you plan on working on, post here. If you have a need and want to lobby for it, post here too.

Time horizon for this iteration: 31 January 2019

Thanks for getting this started - I agree it’s important to keep churning through the roadmaps to make sure they stay informative / relevant.

A quick question: I believe somebody else (maybe @minrk?) suggested reviewing the roadmaps in the team meetings. How does that relate to this async approach?

I’m happy with either approach. I’m not sure we need both. My original concern was that we did roadmaps before in Jupyter and they were written once and mostly ignored after that, possibly because we didn’t integrate them into any workflow/meetings/etc… I proposed adding roadmap review to monthly meetings as one mechanism to make that less likely. Trying other mechanism(s) is fine, too!

My thinking was that we (the “team”) could use discourse to have this discussion async and only the few things that do need real time voice communication get picked up in the team meeting. (if we ramp up the number of roadmaps and checking in on them every few months we will spend most of our team meetings doing that :-/).

The other motivation is that this would make it easier for others to get involved in the process or at least see what is being discussed and how.

I can think of two areas that could be interesting to work on in the next few months:

  • build pack support for more languages
  • more content providers (zip files, zenodo, etc)

Adding the .zip file and then Zenodo content providers should be straight forward. This would expand where we can pull things from and would increase the usefulness in the field of publishing research work.

Adding support for more languages (JS, c++, rust). That would come in two parts: new build packs and some testing infrastructure so that we have one test that checks that all build packs correctly implement start files, postBuild and such. Currently we have to write the tests for this over and over for each build pack which means we make mistakes.

Personally I will work on being able to “mount” the content of the repository to alternative locations inside the image. Right now we always place things at /home/$USER. For BinderHubs with auth and persistent storage per user you’d want to be able to mount the repo contents to some other location.

I’m +1 on @betatim’s idea of async via discourse, and then we have more focused discussion in the team meetings. We should make sure to invite lots of comment on this from the community!

On the last call, we (Whole Tale project) mentioned a couple of items that we’re interested in working on, one that will likely require some discussion.

  1. Parameterizing the template, which would allow us to control the base image among other things (e.g., default packages)
  2. Pinning the repo2docker version per jupyter/repo2docker#170 (comment)
  3. Supporting an /environment directory or similar ala jupyter/repo2docker#386

I’ve written up some of our rationale in https://github.com/whole-tale/whole-tale/issues/52, but would be happy to have that discussion here.

What is the motivation for transitioning away from /binder? Changing the name is easy, the hard part is building the testing and code to handle /binder and what ever new name we invent now in parallel. I think we need to (for quite a while) support the users who are already using /binder, offer them a (automatic) way to transition to the new name (how do you do this when they interact with the repo through a service like a BinderHub?) and support users who are using the new name. Plus the edge cases like repos that have both the old and new name. Overall it isn’t hard but quite fiddly to get right :-/ Given all that, is it worth doing?

I’m +1 on supporting a tool-agnostic folder name (I think that’s the main drawback of it). Why not make this a CLI parameter? That way you can invoke repo2docker telling it “look for this folder to find the config files if not in root”. Then mybinder.org could keep using binder/ if it wishes, and other use-cases could do whatever else they like.

re: base image IIRC we already have an issue open to discuss this, right? There’s some conversation about it here (https://github.com/jupyter/repo2docker/issues/471) and I just opened up (https://github.com/jupyter/repo2docker/issues/487) to discuss this specifically. Would love to see your input @craig-willis!

Won’t we end up with repos that only work on the particular deployment they were targeting for no good reason other than the environment is defined in /the-other-environment-directory? This makes me less of a fan.

If I had to pick a different name I’d propose repo2docker/ as that is the name of the tool that understands the contents and so far there is no other tool or a proposal of another tool that would understand the contents.

There has been a good proposal to support this by having repo2docker start an older version of itself after inspecting the repository. How about /binder/repo2docker.version (or differently named directory) as the file that specifies the version (git commit or version) that the user wants to use?

yeah that’s possible…there’s definitely downside in letting people choose the folder name. To me, it boils down to:

Make them use a pre-defined folder:

  • PRO: there’s now a standard across repositories that we know repo2docker will work for
  • CON: we’re making an opinionated choice for people and maybe they want their own folder name (e.g., we did begin calling it “binder/” after all)

Let them choose a folder:

  • PRO: there’s more flexibility in the ecosystem so r2d would potentially be appealing to broader use-cases
  • CON: we add fragmentation in the sense that there is no longer a repository-level standard when it comes to folder naming

In the end, it’s just a naming and understandability thing. We will almost certainly put the configuration files in a directory and I can envision many user questions about what is /binder much like we already get for “jovyan” and “kitematic”. /binder also suggests that we are running BinderHub when we are at this point working on leveraging repo2docker – which I see as a deterministic way to generate a Dockerfile and image that could be run in BinderHub or not. That said, I think it’s perfectly reasonable for us to commit to move closer to the Binder ecosystem and accept the current folder name as a convention.

I get why. I’d note that both WT and Code Ocean refer to the computational “environment” which is generally understood by our users. The current Capsule export includes an /environment folder that contains the generated Dockerfile. So, if done right maybe a researcher could run their Capsule via BinderHub or WT?

On the topic of pinning the version:

This sounds like a good approach.

1 Like

(I think just quoting your reply will get us a link back to where you quoted me, which is a reply to the original point. Still figuring out this discourse thing :slight_smile: )
Do you guys want to start tackling this? Should we put it on the near term list? Either way I think we should turn this into an issue on the GitHub repo as at least two people already agree this is a good starting point and maybe someone wants to pick it up.

How could we establish if/when/how far away we are from executing the code ocean generated Dockerfiles and getting a useful/usable docker image for something like repo2docker or other tools that aren’t code ocean tools? Are there public repos/examples to experiment with?

Using /environment feels a bit presumptuous (“this is where the environment is defined” or the “Center for Open Science” where the name suggests it is (the only) place where open science happens which gets them a lot of ill will from the community) and like we’d end up with conflicts because other tools use the directory but the contents isn’t compatible :-/

My position is that we should evaluate what the positives are that we get from this as I think it is a lot of work and pain for both users (transition your repos) and developers (build tools to help transition). This means we should have some good positives to make it worth it. I do think if we can get version support in and established we reduce some of the downsides of making such a transition as we could implicitly limit repositories with a /binder directory to older versions of repo2docker!

And/or with some pondering we can leverage the idea of entrypoints and extension mechanism as well?

After thinking about it more, I see @betatim’s point that we may be introducing more trouble than it’s worth to let people configure this on their own. I can see the value in just having one name that people can use across projects.

How do folks think about using a community poll for this one? Options would be:

  • binder/
  • environment/
  • repo2docker/
  • Other ideas?

WDYT?

I like the idea of a community poll, and also looking at repositories that currently have a binder/ folder to see what they are doing.

@betatim – sorry for the delay. Yes, we will be happy to work on the repo2docker.version implementation. repo2docker integration is one of our next milestones.

My comment about CO was a bit off the cuff. They have hundreds of publicly downloadable Capsules today. One immediate snag is that the Dockerfiles are intended for CLI execution only – there is no interactive environment. So while they are buildable, they’re not very useful in the Binder context.

On the topic of /environment, in WT we can be more prescriptive – I clearly didn’t consider the full implications for Binder and users. Happy to participate in a poll or write up pros/cons.

We’ll start looking at the entrypoints and provide feedback on entrypoints as extension mechanism · Issue #488 · jupyterhub/repo2docker · GitHub.

1 Like

No worries about the delay. That is the beauty of async communication, it works even with long pauses :slight_smile:

I updated the roadmap document a bit so that we don’t block that just because there are things we need to discuss more. With the holidays coming up I’d expect people to be busy with other things as well, so having a shorter “right now” section is probably good.

I downloaded a few capsules and looked at the Dockerfile they have. It seems like there isn’t much left to do, you can just run them as is. Like you said they aren’t really made for interactive stuff which means using a CO capsule in a place like a BinderHub will probably not work :-/ Or we’d have to significantly augment the Dockerfile, but repo2docker says: if you make a Dockerfile it is all up to you, ultimate freedom (at a price)! So I think we shouldn’t try and augment.


On a more philosophical note, I don’t feel particularly excited about people putting in a bunch of effort to make it easier for users of a commercial tool to also use an open-source tool. It feels like that should be the task of the commercial tool people (who are welcome to contribute!). From what I have observed/noticed there is a lot of open-source flowing into CO but not a lot coming back out.

1 Like

On a more philosophical note, I don’t feel particularly excited about people putting in a bunch of effort to make it easier for users of a commercial tool to also use an open-source tool. It feels like that should be the task of the commercial tool people (who are welcome to contribute!). From what I have observed/noticed there is a lot of open-source flowing into CO but not a lot coming back out.

just a POI - I don’t believe that CO has contributed anything back to the jupyterhub ecosystem (or lab ecosystem, for that matter)

@craig-willis there was discussion of building a lightweight GUI around repo2docker as a separate project…basically it’d just expose the command line options for repo2docker as a simple web app that’d spit out the appropriate configuration files. Would that be something of value?

I fully understand your points about CO and will refrain from using them in future examples. From where I sit, I have to live with them and I see interoperability as inevitable.

Good question. After repo2docker integration, we will inevitably add something like that into the WT UI, probably starting in the Feb/March timeframe. I doubt there would be a convenient way for that work to be reused, since we’re currently bound to Ember.

1 Like