Having been building on top of repo2docker for a few months with treebeard I’m really impressed by the generality of the config (apt.txt, postBuild, start, etc.)
What strikes me though is the config is very docker-friendly (different files can be inserted into different layers) but could be more user-friendly.
Gonna keep this short because no doubt there are 100 reasons this has not already been done (I became curious after reading this), but why not compile the various config elements into a single yaml?
Benefits:
short files e.g. apt.txt read nicer inline
postBuild which could be very short or very long can be broken into 1 or more scripts referenced by the config (and can be named more meaningfully)
start could be replaced by runtime scripts which don’t have to remember to put exec "$@" at the end
Having got about 30 projects running on binder I’d love an interface like this (inspired a bit by Dockerfile and GitHub actions):
edit: Probably a good time to note we have already been experimenting with using notebooks as build scripts (after chatting with some of the Netflix folk who love that notebooks provide an immutable record of inputs and outputs, a bit like bash -x).
Having a single config file is interesting and as you guessed has been discussed a few times already. I think to discuss this topic we need to do it in some context. In what kinds of situations would you want to have a single config file? What was the build up to the situation where you thought “damn, I wish I had a single yaml file here!”?
The main trigger for me was when trying to make a complex postBuild script readable.
It both had to install labextensions and straggler pip packages (you know the ones that never install nicely with the rest of them).
Well basically I wanted two scripts. setup_labextensions and install_deps.
At this point postBuild would become an entrypoint to these setup scripts and in my mind it would be better to cut this out of the equation all together.
When I give talks about Binder, I usually say something to the effect of “launching a binder is the reward you receive for following best software practices”. So my two cents are yes I agree, having a single config file would be easier in some regards, but I do think it detracts from the sentiment of the project meeting some software standard and doing so in a way that is familiar to the community the software is embedded in.
That is to say, I know what a requirements.txt file does if I don’t know about Binder, I probably don’t know what a binder.yaml file does.
I guess this is where we disagree. For sure requirements.txt etc is well known at this point, but I find the binder config format is unfamiliar anyway.
Do you find most repos require a postBuild script?
In many cases we end up putting in 1-3 config files that would otherwise not exist
If you’ve already got a format in mind it could be interesting to prototype a script that converts your single file to a .binder directory of files, and perhaps vice-versa?
Binder already prioritises what it looks for when deciding what config files to use. At the risk of yet-another package type, Binder could support a single config file, perhaps considered just after the Dockerfile; there’d be no need to stop supporting any of the other files, it would just complicate things on the Binder side, and make the repo a little less accessible to folk reading it and seeing an unfamiliar binder.yaml
If someone really wanted a binder.yaml file in the meantime, I guess they could write a parser and run it from postBuild? (Or does postbuild block you from apt-get? I forget).
Yes, they do in repo2docker… apt.txt is consumed, then later postBuild is. I’m just not sure if the permissions used for the apt.txt install have been de-escalated by the time the build step gets to postBuild?