Today we launched an official repo2docker GitHub Action that makes it even easier for people to leverage the power of repo2docker in their workflows.
One problem I have been thinking about solving is the desire for folks to quickly launch a ready-to-go Jupyter Server with dependencies loaded (via repo2docker) on the compute and cloud of their choice, which is very useful if you want specialized compute like GPU, or a high memory/CPU instance and for whatever reason mybinder.org doesn’t suit your needs. Google Colab is ok, but you have to install all your dependencies so I think there is a gap to be filled by repo2docker so that you can get and share an environment with others that is ready to go with the appropriate compute with no hassle.
What I was thinking it would be nice to provide a high level API that takes as input your (1) cloud credentials (2) instance type, and in return it would provide a URL for you to access your Jupyter server. It would be nice if this automatically happens when you fork a repository and you can be guided through the steps so this can take place, for example in a conference that has a training session that involves Jupyter Notebooks. Or triggered manually. We could make something like this with GitHub Actions.
Some questions I have:
- Is this something that people would want in the Jupyter community?
- Is there anything inappropriate about this idea about integrating with a cloud provider like this? Any guidelines or suggestions on how to approach this or alternative ideas, or any tips on keeping this as agnostic as possible? I want to try to prototype this out on one cloud to begin with, and was thinking Google Cloud since I have no affiliation with them, to reinforce the neutrality of my intentions. I’m also happy to not work on this if this is a bad idea. I just wanted to discuss this.
- There might be security concerns of providing people with a URL that anyone can abuse and might be a vector for malicious actors to exploit users who use this tool. Some ideas I have to mitigate this:
- Restrict the GitHub Action to only do this to do these things on private repositories.
- Don’t provide a URL and force people to ssh and port forward to localhost, but provide the command to do this so people can just copy and paste this in their terminal instead of a URL. This will require some additional steps for people to setup but might be ok.
Aside: I was contemplating using ngrok to generate the URL by instantiating that on the VMs, which in my experiments seem to work. We would just have to discuss if it is possible to secure this sufficiently or we need to go down the ssh tunnel route instead.
I haven’t fleshed this idea out completely, but I wanted to get general opinions and guidance on how or if I should even try to work on this. Really looking forward to everyone’s input.