How to reduce mybinder.org repository startup time

Hi @hamel, thanks for the fix. When re-running the flow we are now getting cant open file binder_cache.py - any idea on the resolution?

@MichalChromcak can you point to a repo where you are experiencing this error so I can better debug? Iā€™m not able to reproduce this error on my end

EDIT: I re-released the Action, its possible that I didnā€™t catch this because binder_cache.py exists in my repository, however I hardcoded the path to the path inside the Docker Container and tested it and it works.

I was able to successfully run a test from a different repo here: https://github.com/machine-learning-apps/great-expectations-render/runs/845959790?check_suite_focus=true

Can you try re-running?

Many thanks @hamel, checks pass now (https://github.com/heidelbergcement/hcrystalball/runs/848749119?check_suite_focus=true)

Unfortunatelly when trying to launch binder, I cannot pass by following (tried several times). Will try to remove binder_cache for now.
image

1 Like

Whilst we are talking about binder and GHA:

We (treebeard) shamelessly re-architected our notebook CI framework onto GitHub Actions after seeing the great pattern laid out by Hamel.

As huge fans of both Jupyter and GitHub actions weā€™d like to help the community work more smoothly with infrastructure, so our project is focussed more on catching bugs during integration.

I also had some ideas for tightening up the config schema which may come in handy if considering a formulation for binder which uses pre-cached images (definitely for another thread though).

@choldgraf Can we add something to the top post about mybinder building arbitrary environments for reproducibility in contrast to Colabā€™s kitchen sink environments for quick development? I think the way the first paragraph currently reads might leave users asking why we do that and distinguishing mybinder from other cloud services would help answer that. Sorry I donā€™t have the exact words to edit it myself!

1 Like

done! what do you think?

1 Like

Looks fab, thank you!

Looking into this now

@choldgraf Do you know how I can see the logs for the launch part of https://mybinder.org/v2/gh/machine-learning-apps/repo2docker-action/master

The Docker container builds successfully but it failes to launch. I would like to see the logs so I can debug, is there any place to find this information?

Argg sadly BinderHub doesnā€™t currently store the logs anywhere - I think itā€™s something that everybody thinks is a good idea just nobody has implemented it yet. You could use repo2docker locally, I believe that is what folks suggested the last time we discussed: https://github.com/jupyterhub/binderhub/issues/155#issuecomment-592479410

1 Like

Often the problem when something doesnā€™t launch after building is because the repo uses a Dockerfile and doesnā€™t provide the right command for BinderHub to run or sets options to the ā€œwrongā€ value (for example the port on which it listens).

The ā€œget access to logsā€ story many people would like and no one has built has three parts (I think). One is making the build logs available in the container, the second is access to the ā€œconsoleā€ output of the pod and the third would be the relevant-to-that-pod log lines from BinderHub itself.

I think number two would be the ā€œbest bang for buckā€ point to start. Again this could be tackled in two steps: giving people access to the log when they are connected to the pod to help debugging NB extensions and the like (maybe all this requires is a clever stdout redirection to console and file linux trick?) and one where you can gain access to the logs when the launch fails (maybe we should stream these to the build log part of the website).

1 Like

What is the command? Is this in the docs somewhere? If so I cannot find it, please let me know as this could solve all my problems :slight_smile:

Check out here: Use a Dockerfile for your Binder repository ā€” Binder 0.1b documentation

does that work?

2 Likes

Yes it does! Sorry about that I was looking in the repo2docker docs and not in Binder :man_facepalming: Thanks for pointing me to it this is very helpful

2 Likes

@choldgraf @betatim These docs are indeed helpful, however if I build my container with repo2docker it appears that repo2docker sets up everything the right way to run on Binder for me, so as long as I am not overriding the entrypoint etc of the /binder/Dockerfile it should just work. However, this is not the case. For example:

Iā€™m trying to debug @MichalChromcakā€™s use of the repo2docker Action, and this works fine:

jupyter-repo2docker https://github.com/hamelsmu/hcrystalball

However, if I go to mybinder.org and try to launch it gets stuck at launching server... step

Any ideas or tips on what I can try from here?

EDIT: It eventually failed with Internal Server Error

image

EDIT: I suppose I could spin up my own private BinderHub cluster to debug this, but that seems like a last resort, but just wanted to check if anyone had any additional ideas before I go down this pathā€¦

How was the image (hamelsmu/hcrystalball:1c2ca65c7ef1) that is referenced in the Dockerfile built? Can we see the Dockerfile for it or the ā€œsourceā€ repository that you gave to repo2docker?

Otherwise it is super difficult to see/know what the image is doing and hence what yould be going wrong.

I think the Dockerfile in the repo needs to have ARGs for NB_USER and NB_UID

ref

binder ref item 3

I built the image with repo2docker from the same repository that it was in. Before there was no binder folder. The GitHub action runs repo2docker and pushes the resulting image to a public registry tagged with the SHA, and then adds a binder/Dockerfile that you see there

To reproduce how the image was built if you were to delete the binder folder and ran repo2docker this is exactly how the image was built

1 Like

If I built the image with repo2docker do I need to add this? It looks like repo2docker adds these for me?

Hello, I have to say I do not understand anything of github actions, dockers images and so :slight_smile: :slight_smile:

How can I pre-build the image for mybinder so that I donā€™t have one unlucky user that wait 30 minutes if I just made a push on my github repository ?

I felt the original (first post) of this thread offered a solution where the image with all the stuff declared in my Project.toml (my repository is based on Julia) is built only when Project.toml changes (and this is what take most of the time) and then it is somehow updated (but not rebuilt from zero) at any first-visit-after-pull with the ā€œcontentā€ of my specific repository. This seems efficient, so I implemented a new github repository where I copied my Project.toml and then I used the form in the given page (https:// jupyterhub. github. io /nbgitpuller/link), but I had no successā€¦ it builds the env of the BetaMLRequirements but then it gives me a ā€œpage not foundā€ error.

So, I am going for this second pathway where, if I understood correctly, ā€œsomethingā€ (a full build??) happens everytime I pull on the repository.
There are however several ā€œversionsā€ of this yaml file, the one in this post, the one in the linked README,ā€¦ and I donā€™t know which one is the correct one. Also I have no idea of which are the docker username/password required? I donā€™t think it is my github password ??