How to reduce mybinder.org repository startup time

Ok, I did try with both configuration files, but both fails:

  • https://github.com/sylvaticus/BetaML.jl/actions/runs/167436322

  • https://github.com/sylvaticus/BetaML.jl/actions/runs/167422408

    cache binder build on mybinder .org
    Run machine-learning-apps/repo2docker-action@0.2
    with:
    NO_PUSH: true
    MYBINDERORG_TAG: refs/heads/master
    /usr/bin/docker run --name d3182da6922129476ea924e1b26bc2d7e1_fd7017 --label 3888d3 --workdir /github/workspace --rm -e INPUT_NO_PUSH -e INPUT_MYBINDERORG_TAG -e INPUT_DOCKER_USERNAME -e INPUT_DOCKER_PASSWORD -e INPUT_DOCKER_REGISTRY -e INPUT_IMAGE_NAME -e INPUT_NOTEBOOK_USER -e INPUT_LATEST_TAG_OFF -e INPUT_ADDITIONAL_TAG -e INPUT_BINDER_CACHE -e INPUT_PUBLIC_REGISTRY_CHECK -e INPUT_NO_GIT_PUSH -e HOME -e GITHUB_JOB -e GITHUB_REF -e GITHUB_SHA -e GITHUB_REPOSITORY -e GITHUB_REPOSITORY_OWNER -e GITHUB_RUN_ID -e GITHUB_RUN_NUMBER -e GITHUB_ACTOR -e GITHUB_WORKFLOW -e GITHUB_HEAD_REF -e GITHUB_BASE_REF -e GITHUB_EVENT_NAME -e GITHUB_SERVER_URL -e GITHUB_API_URL -e GITHUB_GRAPHQL_URL -e GITHUB_WORKSPACE -e GITHUB_ACTION -e GITHUB_EVENT_PATH -e RUNNER_OS -e RUNNER_TOOL_CACHE -e RUNNER_TEMP -e RUNNER_WORKSPACE -e ACTIONS_RUNTIME_URL -e ACTIONS_RUNTIME_TOKEN -e ACTIONS_CACHE_URL -e GITHUB_ACTIONS=true -e CI=true -v “/var/run/docker.sock”:"/var/run/docker.sock" -v “/home/runner/work/_temp/_github_home”:"/github/home" -v “/home/runner/work/_temp/_github_workflow”:"/github/workflow" -v “/home/runner/work/BetaML.jl/BetaML.jl”:"/githu:-/b/workspace" 3888d3:182da6922129476ea924e1b26bc2d7e1
    Validate Information
    Build Image Without Pushing
    usage: jupyter-repo2docker [-h] [–config CONFIG] [–json-logs]
    [–image-name IMAGE_NAME] [–ref REF] [–debug]
    [–no-build]
    [–build-memory-limit BUILD_MEMORY_LIMIT]
    [–no-run] [–publish PORTS] [–publish-all]
    [–no-clean] [–push] [–volume VOLUMES]
    [–user-id USER_ID] [–user-name USER_NAME]
    [–env ENVIRONMENT] [–editable]
    [–target-repo-dir TARGET_REPO_DIR]
    [–appendix APPENDIX] [–subdir SUBDIR] [–version]
    [–cache-from CACHE_FROM]
    repo …
    jupyter-repo2docker: error: argument --image-name: ‘sylvaticus/BetaML.jl:3935d5e0eadb’ is not a valid docker image name. Image namemust start with an alphanumeric character andcan then use _ . or - in addition to alphanumeric.

I’m not sure that they are in the Dockerfile used by the repo sent to binder. If you removed the Dockerfile, does it work?

as a general rule, repo2docker assumes that if you have a Dockerfile in your repository, you are an advanced user and so it does a lot less decision-making for you, so no I don’t believe it changes anything in the Dockerfile if you provide your own Dockerfile (which is why we documented all the steps you must manually take in that case)

1 Like

I think there is a misunderstanding, when I run repo2docker there NO dockerfile. repo2docker is used to build a docker image. Only After that I add a file /binder/Dockerfile to the repo referring to the Docker image just pushed. Do you think this also poses a problem for me somehow? I am using /binder/Dockerfile only to “cache” things for BinderHub but things are built without referring to a Dockerfile

Indeed I am now confused :slight_smile:

mmm - so you’re taking the Dockerfile that is generated by repo2docker (e.g. with -no-build) and then just putting in the repository for reference?

@choldgraf The first time the image is built, there is no Dockerfile in the repo, so :

  1. repo2docker builds an image and pushes it to a public registry
  2. A new file is created /binder/Dockerfile with a FROM image name is committed to the repo automatically

My hope that this Docker image has all the things needed in it. Although now that I’m writing this I suppose I should just try adding those arguments in again and see what happens (I wish I could see the logs!!).

I’ll try a couple of things and come back to this thread with more information

2 Likes

Sounds good :+1:

I’d also be curious to dig in a bit more why this approach is faster than auto-triggering a build on a binderhub each time a change is made to the repository. It feels to me like the latter should be faster, since then you’re pulling images from the same registry on which the BinderHub is running. But docker works in mysterious ways so :man_shrugging:

You must be doing something different when you build the image because https://mybinder.org/v2/gh/betatim/hcrystalball/master (forked the repo just now and deleted the binder/ directory) launches fine.

When I launch your pre-built image I see the following in the debug log:

Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/traitlets/traitlets.py", line 528, in get
    value = obj._trait_values[self.name]
KeyError: 'runtime_dir'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/conda/envs/notebook/bin/jupyter-notebook", line 11, in <module>
    sys.exit(main())
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/jupyter_core/application.py", line 270, in launch_instance
    return super(JupyterApp, cls).launch_instance(argv=argv, **kwargs)
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/traitlets/config/application.py", line 663, in launch_instance
    app.initialize(argv)
  File "<decorator-gen-7>", line 2, in initialize
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/traitlets/config/application.py", line 87, in catch_config_error
    return method(app, *args, **kwargs)
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/notebook/notebookapp.py", line 1766, in initialize
    self.init_configurables()
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/notebook/notebookapp.py", line 1380, in init_configurables
    connection_dir=self.runtime_dir,
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/traitlets/traitlets.py", line 556, in __get__
    return self.get(obj, cls)
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/traitlets/traitlets.py", line 535, in get
    value = self._validate(obj, dynamic_default())
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/jupyter_core/application.py", line 100, in _runtime_dir_default
    ensure_dir_exists(rd, mode=0o700)
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/jupyter_core/utils/__init__.py", line 13, in ensure_dir_exists
    os.makedirs(path, mode=mode)
  File "/srv/conda/envs/notebook/lib/python3.7/os.py", line 211, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/srv/conda/envs/notebook/lib/python3.7/os.py", line 211, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/srv/conda/envs/notebook/lib/python3.7/os.py", line 211, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/srv/conda/envs/notebook/lib/python3.7/os.py", line 221, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/home/hamelsmu/.local'

What user does repo2docker run as in your build process?

1 Like

Oh wow.

@betatim I’m running repo2docker inside Actions

However what you discovered has given me some other clues not sure how that happened !

1 Like

@betatim just to confirm are running this docker image? (I am having trouble reproducing, so want to make sure we are trying the same image?)

docker run hamelsmu/hcrystalball:1c2ca65c7ef1

EDIT: Nevermind, it is really odd that my username is appearing in the container. This is definitely something I must debug (and will try to figure out how this happened).

I can confirm that all files are mounted to /home/hamelsmu which is really odd. :man_facepalming: I will debug this and report back to this thread. Thanks

repo2docker defaults to using your local username and UID when building an image. You can override it with some command line args.

Could this particular image have been built locally and pushed?

1 Like

Yes I think that is the problem! I’m in the process of refactoring everything to set the proper userid as well as add additional tests etc

2 Likes

@choldgraf @betatim @willingc I am pleased to report that this thread has helped me solve the problem, thank you so much for your guidance.

I have forked some repositories that previously were not working, and applied my Action and it works on mybinder.org!

@MichalChromcak I have finally fixed this bug (thanks for reporting it). I even forked your repo to verify that this can successfully launch on mybinder.org after the Action is applied. Can you give things a go one more time and report back if it works for you?

3 Likes

wahooo good job everybody :tada:

Awesome @hamel. Glad that you worked through it. If there is anything that you feel like should be added to the docs, just let us know. There were some folks at SciPy that were intrigued by the possibility of using the Action.

Would be happy to help the SciPy folks. Do you think you could introduce me to the folks that are particularly interested?

@hamel Thanks a lot for the ongoing effort you put into this.
Current status - after merging your PR and clicking on the binder link I get the following


When refreshing the page, I already see a familiar “Found image etc…”
and I am presented with the environment, also when quitting from it and launching again. Thanks for the PR!

I am wondering whether to put some curl command after the action is done to automate the first part (try to spin the environment once) so that one does not need to remember to check it with every re-build. Any thoughts on that? Or is there some parameter within the GitHub action itself, that would allow for it? (for me, it should be easy to add it, not sure about generic use/need)

UPDATE:
When trying some time later, I am again getting the following

Found built image, launching...
Launching server...
Failed to connect to event stream

But after some time, it’s good again…hm…

@rohitsanj may have specific folks. I was generally hearing about good things with GitHub Actions and Binder at the conference. There was someone talking about binder badges but I can’t remember who it was. Rohit, do you recall hearing anything related to this?

1 Like

Cool stuff @hamel! I think I might know a few people who could use Actions to run papermill jobs. Although, I don’t think I came across anyone talking about binder badges at SciPy.

2 Likes

Good idea. You can cache builds directly on mybinder.org like this


name: Test
on: push

jobs:
  Create-MyBinderOrg-Cache:
    runs-on: ubuntu-latest
    steps:
    - name: cache binder build on mybinder.org
      uses: machine-learning-apps/repo2docker-action@master
      with:
        NO_PUSH: true
        MYBINDERORG_TAG: ${{ github.event.ref }}

The only issue is that the Action script doesn’t block waiting for completion of the mybinder.org build as it could take a long time. I’m not sure of the best way to handle this but this option uses CURL to trigger the build, which is better than nothing.

I would appreciate any help from anyone who is interested in helping me optimize this action or has any ideas. @betatim / @choldgraf perhaps transferring the repo to Jupyter might help with visibility and I might be able to get additional help there?

P.S. I’m using this script within the Action to trigger mybinder.org when the parameter MYBINDERORG_TAG is specified above: https://github.com/machine-learning-apps/repo2docker-action/blob/2aa73524ac41f92bd2a4cdf90a7a33f461ea59e5/trigger_binder.sh