How to reduce mybinder.org repository startup time

MichalChromcak · July 7, 2020, 9:53am

Hi @hamel, thanks for the fix. When re-running the flow we are now getting cant open file binder_cache.py - any idea on the resolution?

hamel · July 7, 2020, 2:13pm

@MichalChromcak can you point to a repo where you are experiencing this error so I can better debug? I’m not able to reproduce this error on my end

EDIT: I re-released the Action, its possible that I didn’t catch this because binder_cache.py exists in my repository, however I hardcoded the path to the path inside the Docker Container and tested it and it works.

I was able to successfully run a test from a different repo here: https://github.com/machine-learning-apps/great-expectations-render/runs/845959790?check_suite_focus=true

Can you try re-running?

MichalChromcak · July 8, 2020, 7:34am

Many thanks @hamel, checks pass now (https://github.com/heidelbergcement/hcrystalball/runs/848749119?check_suite_focus=true)

Unfortunatelly when trying to launch binder, I cannot pass by following (tried several times). Will try to remove binder_cache for now.

alex-treebeard · July 8, 2020, 8:48am

Whilst we are talking about binder and GHA:

We (treebeard) shamelessly re-architected our notebook CI framework onto GitHub Actions after seeing the great pattern laid out by Hamel.

As huge fans of both Jupyter and GitHub actions we’d like to help the community work more smoothly with infrastructure, so our project is focussed more on catching bugs during integration.

I also had some ideas for tightening up the config schema which may come in handy if considering a formulation for binder which uses pre-cached images (definitely for another thread though).

sgibson91 · July 8, 2020, 3:22pm

@choldgraf Can we add something to the top post about mybinder building arbitrary environments for reproducibility in contrast to Colab’s kitchen sink environments for quick development? I think the way the first paragraph currently reads might leave users asking why we do that and distinguishing mybinder from other cloud services would help answer that. Sorry I don’t have the exact words to edit it myself!

choldgraf · July 8, 2020, 3:44pm

done! what do you think?

sgibson91 · July 8, 2020, 4:34pm

Looks fab, thank you!

hamel · July 10, 2020, 8:26pm

Looking into this now

hamel · July 10, 2020, 8:45pm

@choldgraf Do you know how I can see the logs for the launch part of https://mybinder.org/v2/gh/machine-learning-apps/repo2docker-action/master

The Docker container builds successfully but it failes to launch. I would like to see the logs so I can debug, is there any place to find this information?

choldgraf · July 10, 2020, 9:34pm

Argg sadly BinderHub doesn’t currently store the logs anywhere - I think it’s something that everybody thinks is a good idea just nobody has implemented it yet. You could use repo2docker locally, I believe that is what folks suggested the last time we discussed: https://github.com/jupyterhub/binderhub/issues/155#issuecomment-592479410

betatim · July 11, 2020, 6:04am

Often the problem when something doesn’t launch after building is because the repo uses a Dockerfile and doesn’t provide the right command for BinderHub to run or sets options to the “wrong” value (for example the port on which it listens).

The “get access to logs” story many people would like and no one has built has three parts (I think). One is making the build logs available in the container, the second is access to the “console” output of the pod and the third would be the relevant-to-that-pod log lines from BinderHub itself.

I think number two would be the “best bang for buck” point to start. Again this could be tackled in two steps: giving people access to the log when they are connected to the pod to help debugging NB extensions and the like (maybe all this requires is a clever stdout redirection to console and file linux trick?) and one where you can gain access to the logs when the launch fails (maybe we should stream these to the build log part of the website).

hamel · July 11, 2020, 4:02pm

What is the command? Is this in the docs somewhere? If so I cannot find it, please let me know as this could solve all my problems

choldgraf · July 11, 2020, 6:57pm

Check out here: Use a Dockerfile for your Binder repository — Binder 0.1b documentation

does that work?

hamel · July 12, 2020, 8:24pm

Yes it does! Sorry about that I was looking in the repo2docker docs and not in Binder Thanks for pointing me to it this is very helpful

hamel · July 12, 2020, 8:29pm

@choldgraf @betatim These docs are indeed helpful, however if I build my container with repo2docker it appears that repo2docker sets up everything the right way to run on Binder for me, so as long as I am not overriding the entrypoint etc of the /binder/Dockerfile it should just work. However, this is not the case. For example:

I’m trying to debug @MichalChromcak’s use of the repo2docker Action, and this works fine:

jupyter-repo2docker https://github.com/hamelsmu/hcrystalball

However, if I go to mybinder.org and try to launch it gets stuck at launching server... step

Any ideas or tips on what I can try from here?

EDIT: It eventually failed with Internal Server Error

EDIT: I suppose I could spin up my own private BinderHub cluster to debug this, but that seems like a last resort, but just wanted to check if anyone had any additional ideas before I go down this path…

betatim · July 13, 2020, 6:36am

How was the image (hamelsmu/hcrystalball:1c2ca65c7ef1) that is referenced in the Dockerfile built? Can we see the Dockerfile for it or the “source” repository that you gave to repo2docker?

Otherwise it is super difficult to see/know what the image is doing and hence what yould be going wrong.

willingc · July 13, 2020, 6:43am

I think the Dockerfile in the repo needs to have ARGs for NB_USER and NB_UID

ref

binder ref item 3

hamel · July 13, 2020, 1:40pm

I built the image with repo2docker from the same repository that it was in. Before there was no binder folder. The GitHub action runs repo2docker and pushes the resulting image to a public registry tagged with the SHA, and then adds a binder/Dockerfile that you see there

To reproduce how the image was built if you were to delete the binder folder and ran repo2docker this is exactly how the image was built

hamel · July 13, 2020, 1:41pm

If I built the image with repo2docker do I need to add this? It looks like repo2docker adds these for me?

sylvaticus · July 13, 2020, 1:46pm

Hello, I have to say I do not understand anything of github actions, dockers images and so

How can I pre-build the image for mybinder so that I don’t have one unlucky user that wait 30 minutes if I just made a push on my github repository ?

I felt the original (first post) of this thread offered a solution where the image with all the stuff declared in my Project.toml (my repository is based on Julia) is built only when Project.toml changes (and this is what take most of the time) and then it is somehow updated (but not rebuilt from zero) at any first-visit-after-pull with the “content” of my specific repository. This seems efficient, so I implemented a new github repository where I copied my Project.toml and then I used the form in the given page (https:// jupyterhub. github. io /nbgitpuller/link), but I had no success… it builds the env of the BetaMLRequirements but then it gives me a “page not found” error.

So, I am going for this second pathway where, if I understood correctly, “something” (a full build??) happens everytime I pull on the repository.
There are however several “versions” of this yaml file, the one in this post, the one in the linked README,… and I don’t know which one is the correct one. Also I have no idea of which are the docker username/password required? I don’t think it is my github password ??

Topic		Replies	Views
GitHub Actions + Binder Binder community , how-to	7	2344	November 22, 2019
Repo2Docker: make it easy to start from arbitrary docker image discuss	16	3433	April 27, 2019
Something up with mybinder.org cache Binder	10	1688	June 21, 2023
Jovian.ml increased usage in Binder General	8	1866	October 3, 2020
"reproducible" binder environments with repo2docker, dockerhub and nbgitpuller discuss	10	2131	August 7, 2019

How to reduce mybinder.org repository startup time

Related topics