AWS integration work

Hey Binder folks,

TL;DR:

I will be working with AWS integration for BinderHub, is there guidelines on how to develop integration with new cloud vendors? What are the current thoughts about this from team members?

Full version:

I will be working on AWS integration for BinderHub in the next few months and would like to discuss development strategies in the hope of merging my work into BinderHub itself.

While we do not yet have the go ahead to open source the work I have asked for permission and I expect to have an answer by the end of May.

The first two AWS components I’m working to get working with BinderHub is the Elastic Container Registry (ECR) and CodeCommit.

For ECR there is https://github.com/jupyterhub/binderhub/issues/705 and I did not see an issue for CodeCommit yet.

My initial question is how I should structure the code to be later accepted and merged. Should this be a pip packaged extension into it’s own repo or could we just add new classes in to registry.py? If go the adding classes route, is it acceptable to add an AWS sdk dependency?

1 Like

What do you mean with “AWS integration”? Right now the goal is that BinderHub is oblivious to where your Kubernetes cluster is hosted (Google, AWS, Azure, bare metal, etc).

Sometimes we need to tweak a bit of code to make it work on a cloud provider-no-one-has-tried-so-far. How to specify the URL and login details for the container registry was one such case.

CodeCommit looks like it is hosted Git repositories. Is that a fair summary? In which case it should just work as you can build from anything that git clone ... understands. The special case of GitHub, GitLab.com, etc in the UI are because they offer an additional API so we can turn branch names like master into a revision. I don’t know if we want to accumulate multiple of these “special hosters” in the core BinderHub or if it would be better to use an extension mechanism like entrypoints to let people write small packages that provide that kind of functionality and those who need them pip install them. Direct integration is maybe simpler today but less scalable :-/

1 Like

Yes.
The problem is private repositories and authentication, for CodeCommit there are several options none that look like gitlab or github’s token.

My biggest concern is actually ECR, a hosted docker registry service in AWS. Beyond the needed AWS authentication vodoo, we’d need to create the docker repository each time, as it does not accept pushes into arbitrary repos. So the AWS api has to be called at runtime.

Generally substituting the standard components for AWS hosted counterparts. Beyond these two authentication comes to mind, but is a different and more complex issue.

1 Like

I’ve got this to a working PoC state. Some changes to BinderHub code are needed and some documentation on how to properly configure ECR and CodeCommit. I’ve created https://github.com/jupyterhub/binderhub/issues/918 and updated https://github.com/jupyterhub/binderhub/issues/705 to point the changes that we need to discuss about BinderHub. Once we decide the paths I’ll send PRs for the necessary code and docs changes.

2 Likes