Repo2Docker Image Caching

alex-treebeard · February 10, 2020, 2:27pm

Hey Community,

I’m currently using repo2docker for building images based on my local source directory during development.

I have noticed that it copies the whole repo into the image and then runs pipenv install which can take several minutes.

Step 44/54 : COPY src/ ${REPO_DIR}

As most changes don’t involve dependency changes (in Pipfile or Pipfile.lock), I am curious if anybody has floated the idea of copying over these files first and installing before copying over the rest of the source so we can benefit from layer caching?

I may raise a PR for this myself at some point if there is no good reason why this has not already been done.

Many thanks

betatim · February 10, 2020, 9:50pm

We have a mechanism for this in repo2docker with

github.com

jupyter/repo2docker/blob/8d490cf9d80f963f3746ab2bd04b9fb183b9bab9/repo2docker/buildpacks/base.py#L133-L142


# Run pre-assemble scripts! These are instructions that depend on the content
# of the repository but don't access any files in the repository. By executing
# them before copying the repository itself we can cache these steps. For
# example installing APT packages.
{% if preassemble_script_files -%}
# If scripts required during build are present, copy them
{% for src, dst in preassemble_script_files|dictsort %}
COPY src/{{ src }} ${REPO_DIR}/{{ dst }}
{% endfor -%}
{% endif -%}

It has been implemented for the conda, pip and R buidlpacks. Adding it to the Pipenv buildpack would be a nice contribution.

The trickiest part is detecting if a file which is used to install dependencies (possible) depends on the contents of the repository. For a requirements.txt we can scan for things like -r or -e . in the file. For install.R we implemented a mechanism to attempt to install things and back out if they failed. I am not sure what the right mechanism is for a Pipfile so your thoughts would be great.

Please do open a PR, even if it is just a sketch of an idea. It means others can keep building on it later

alex-treebeard · March 1, 2020, 2:29pm

Thanks! https://github.com/jupyter/repo2docker/pull/857

Topic		Replies	Views
Repo2docker builds don't seem to use docker layers? repo help repo2docker	3	619	July 13, 2023
Repo2DockerSpawner - alternative version JupyterHub	23	2592	August 3, 2020
[ANN] repo2docker v0.10.0 Binder	0	377	August 9, 2019
Pre-building images on BinderHub BinderHub	3	1351	May 27, 2020
What would a repo2docker GUI look like? discuss feedback	13	2163	December 1, 2018

Repo2Docker Image Caching

Related topics