An unfolding story of my first contribution to repo2docker

Potential offshoot PRs to repo2docker

When working towards my goal I end up with a lot of insight into my own developer experience (DX) and realizes potential improvements to the repo to make it easier for others in the future. But instead of straying from my goal to fix those one at the time, I try focus on the goal and instead write them down.

Mention CONTRIBUTING.md in README.md

Perhaps we should mention how to get going with a development environment.

Dependency of semver not installed with pipenv install --dev

After following the installation instructions for pipenv I got the following error when trying to run repo2docker from my virtual environment with pipenv run repo2docker.

erik@xps:~/dev/contrib/repo2docker$ pipenv run repo2docker
Traceback (most recent call last):
  File "/home/erik/.local/share/virtualenvs/repo2docker-MBJmfNIh/bin/repo2docker", line 6, in <module>
    from pkg_resources import load_entry_point
  File "/home/erik/.local/share/virtualenvs/repo2docker-MBJmfNIh/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3241, in <module>
    @_call_aside
  File "/home/erik/.local/share/virtualenvs/repo2docker-MBJmfNIh/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3225, in _call_aside
    f(*args, **kwargs)
  File "/home/erik/.local/share/virtualenvs/repo2docker-MBJmfNIh/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3254, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/home/erik/.local/share/virtualenvs/repo2docker-MBJmfNIh/lib/python3.6/site-packages/pkg_resources/__init__.py", line 583, in _build_master
    ws.require(__requires__)
  File "/home/erik/.local/share/virtualenvs/repo2docker-MBJmfNIh/lib/python3.6/site-packages/pkg_resources/__init__.py", line 900, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/home/erik/.local/share/virtualenvs/repo2docker-MBJmfNIh/lib/python3.6/site-packages/pkg_resources/__init__.py", line 786, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'semver' distribution was not found and is required by jupyter-repo2docker

By writing pipenv install semver this error went away.

Minor base.py Dockerfile optimization

In this docker documentation we find the following section:

Official Debian and Ubuntu images automatically run apt-get clean , so explicit invocation is not required.

So, we could remove apt-get -qq clean && \ from four places in the base.py file.

Clarification of buildpacks ordering

I could really use an example to grasp the ordering here.

Initially thought that LegacyBinderDockerBuildPack was very specific and should override whatever found later, but then I realize that PythonBuildPack inherits from CondaBuildPack and I got a bit confused. It overrides the detect functionality of CondaBuildPack… Hmmm… Will only one build pack be selected from this list for use? I think so, and the idea of the composability of buildpack comes from inheritance.

Make a visual overview of configuration logic

Inspired by @leportella’s visual overviews I think it would be useful to have some kind of flow chart or visualization to demonstrate what buildpack does what etc. It took a while to figure out and I’m still not 100%. I got to read more docs and code still.

Optimize tests - test building minimalistic packages with no dependencies

Our tests installs various packages for testing, but some are bigger than others. I’ve seen numpy being installed for example. Perhaps we can go with some dummy packages. I looked for such packages but ended up choosing to use requests and there along with numpy in my added tests for now. I want to avoid numpy if possible though as I think it can be quite big and slow to resolve relative to other packages.

Optimize CI - ordering of tests

I understand it as various tests are run in parallell, but the order they are executed could be optimized based on having a limited number of parallell runners, four I think.

We could optimize it so that the last test to start isn’t also one that takes up most time because then we will end up using four runners for a long time but then in the end only use a single runner for a long time. It would be better to have continuous use of four runners and try to make the most relevant tests be run first and put the quickest and least commonly failing tests last.

Clarify the repo2docker repo’s relationship between its (dev|doc)-requirements.txt and Pipfile

There is no description about how these are to be used together or individually, I end up confused and spent a while to figure things out.

Perhaps there are about three different scenarios for developers.

  • The actual developer that wants to get all relevant packages installed for development. (Pipfile that includes both the other requirements.txt files contents but also the package itself).
  • The CI test pipeline, that only needs what it needs (dev-requirements.txt)
  • The Docs builder, that only needs what it needs (docs/doc-requirements.txt)

Document docker-compose.test.yml

I don’t know what this file should be used for or is used for, and is one of various things that leaves a question that could had been answered by at least some inline comment in the file.

(Became part of PR) Add some content generated during pytest to .gitignore

I ran all the tests on my computer and ended up with the following remnant files that I don’t want include by mistake in a commit.

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	tests/dockerfile/legacy/._binder.Dockerfile
	tests/dockerfile/legacy/apt-sources.list
	tests/dockerfile/legacy/python3.frozen.yml
	tests/dockerfile/legacy/root.frozen.yml
1 Like