Rebuild Docker Images based on AWS Linux2

sam123 · December 14, 2022, 6:31pm

Hi, there,

I have a question about rebuild docker images and would greatly appreciate your advise.

Currently there are 9 docker images for JupyterHub K8s. Some of them are based on Debian, some based on Alpine. However, for us, we have security requirement that all the docker images have to be based on AWS Linux2. My questions are:

Do we have those docker images based on AWS Linux2 already?
If not, is possible to rebuild all those 9 docker images based on AWS Linux2? That could be a very daunting job!

By the way, could anyone explain currently how you handle security patches?

Any suggestions/ideas would be highly appreciated!

sam123 · December 14, 2022, 6:33pm

Another related question, does anyone know some big companies are using the out-of-box JupyterHub K8s?

Thanks!

consideRatio · December 14, 2022, 6:55pm

Hmmmm @sam123 I think your requirement may be that the k8s nodes are based on virtual machines using an AWS Linux2 image, rather than the containers they start have a FROM <some base image> statement referencing some AWS Linux2 something docker image.

consideRatio · December 14, 2022, 6:57pm

I’ve heard of a few, but here is a public example:

sam123 · December 14, 2022, 9:16pm

Thanks @consideRatio! Actually, there are some docker files maintained by AWS linux team:
https://hub.docker.com/_/amazonlinux

And our security team wants to build all the docker image from AWS linux, as below:
FROM amazonlinux:2

Seems like that’s a huge challenging task to rebuild all the 9 docker images based on amazon linux docker image, considering all the linux difference, and also the configurations, etc.

How do you think? Also, is there any better way that you happen to know?

Thanks!

manics · December 14, 2022, 11:50pm

Only two of the container images are required:

jupyterhub/k8s-hub Python so shouldn’t be too hard to rebuild
jupyterhub/configurable-http-proxy Javascript so shouldn’t be too hard to rebuild

The rest can be omitted by using Network Policies to enforce security, disabling pre-pulling of images, disabling the custom scheduler, and managing SSL certificates yourself.

There’s also the singleuser server, but I’m assuming you’re already building your own anyway!

If there are problems with the two rebuilt images it’ll be harder for us to help you though, so be prepared to do a lot more debugging/investigation work yourself.

sam123 · December 15, 2022, 4:08am

Thanks @manics! Below are total 9 docker images. Do you mean the only required are 1,2 and 6? And all the others can be omitted? Could you let me know what’s the main usages of those omitted images?

jupyterhub/configurable-http-proxy:4.5.3
jupyterhub/k8s-hub:2.0.0
jupyterhub/k8s-image-awaiter:2.0.0
jupyterhub/k8s-network-tools:2.0.0
jupyterhub/k8s-secret-sync:2.0.0
jupyterhub/k8s-singleuser-sample:2.0.0
k8s.gcr.io/kube-scheduler:v1.23.10
k8s.gcr.io/pause:3.8
traefik:v2.8.4

By the way, if we decide to rebuild the image 1,2 and 6, could your team help us to validate the dockerfile?

Thanks!

manics · December 18, 2022, 6:58pm

The network-tools image is to disable access to the cloud metadata endpoint (not necessary assuming you’ve got network policies enabled in your cluster, which I assume you do given your requirement for increased security). Traefik is for Lets-Encrypt HTTPS certificate integration, and the rest are related to optimising the loading speed and costs of singleuser servers in an autoscaling cluster.

The best way to “validate” it is to test it in your production environment with a typical workload.

Have you deployed JupyterHub before? If you haven’t I recommend trying a standard installation in a test environment, so that you understand all the components and can explain to your security team how it works and what the risks are.

sam123 · January 4, 2023, 4:40pm

Thanks, Simon! It’s very helpful.

sam123 · January 4, 2023, 10:23pm

@manics @consideRatio Simon, Eric and all,

I have another quick question and would greatly appreciate your advise. Comparing with rebuild all the docker images, how about rebuild the single jupyterhub docker image and deploy it to EKS?

below is the jupyterhub/jupyterhub docker image:
https://hub.docker.com/r/jupyterhub/jupyterhub/

below is the docker file:

github.com

jupyterhub/jupyterhub/blob/e4f72c9eeb4cd308ff5cbcf21142b2cb0a0345e4/Dockerfile

# An incomplete base Docker image for running JupyterHub
#
# Add your configuration to create a complete derivative Docker image.
#
# Include your configuration settings by starting with one of two options:
#
# Option 1:
#
# FROM jupyterhub/jupyterhub:latest
#
# And put your configuration file jupyterhub_config.py in /srv/jupyterhub/jupyterhub_config.py.
#
# Option 2:
#
# Or you can create your jupyterhub config and database on the host machine, and mount it with:
#
# docker run -v $PWD:/srv/jupyterhub -t jupyterhub/jupyterhub
#
# NOTE
# If you base on jupyterhub/jupyterhub-onbuild

This file has been truncated. show original

As we can see, the above docker file is not super complex and built based on ubuntu. I am thinking to rebuild it based on Amazon Linux and then deploy it to EKS. Is that a good idea, and easier approach compared with rebuild 9 or at least 3 docker images? Any hidden traps/risks?

Thanks for the help!

consideRatio · January 4, 2023, 11:12pm

The jupyterhub/jupyterhub image doesnt include kubespawner i think, and you need to configure all kinds of things if you go that path.

You are in for quite a bit of work to benefit from k8s if you need to rebuild all images, no matter what. It sounds unreasonable to me overall this security requirement - especially if people are going to run their own code in the end.

I’d give up on deploying jupyterhub to k8s all together, and letting jupyterhub run in a VM where you install things from scratch directly to an associated harddrive without using images. Then, maybe using kubespawner, maybe not.

Tough situation

sam123 · January 5, 2023, 3:38pm

@consideRatio Thanks, Eric!

Funny thing that our current setup is based on AWS EC2. The reason why we are looking at K8s is because the AWS EC2 can’t provide any scalability.

Back to the idea to deploy the jupyterhub/jupyterhub image to EKS/K8s, even there is no kubespawner, that’s maybe ok. What I am thinking is to treat each pod of jupyterhub as a VM, multiple users will share the same pod (compared with the zero to JupyterHub on K8s, each user will use a single pod). We can let K8s to do the scalability, for example, based on the cpu usage and/or memory usage, K8s can automatically spin up more Pods. Each pod is a JupyterHub and will be shared by multiple users. How do you think of this approach? Will it work? Especially, I am not quite sure about below two questions with this approach:

Will the user correctly and automatically route back to his previous session? For example, user A login Pod1, then K8s spins up multiple pods (based on the cpu and memory usage), Pod2, Pod3, etc. user A may close the web browser while his session is still active in Pod1. later on, when user A login again, will he/she automatically associate with his previous and still active session in Pod1?
If there are multiple running Pods, each Pod with different workload and different number of login users, for a new user login request, which Pod this request will go? Or, is there any way that we can control that?

Thanks again and have a great day!

manics · January 5, 2023, 5:33pm

If you’re using a single pod as a JupyterHub VM for multiple singleuser servers then if you scale horizontally you’re running multiple completely independent JupyterHubs. They just happen to be behind the same load-balancer but they have no shared knowledge.

This means you’ll need to implement session tracking yourself on top of JupyterHub and any other infrastructure, as well as a writing your own load balancer configuration to work out where to direct a new login to. You’ll also need to deal with synchronising user storage and accounts across multiple JupyterHub servers, and if you get it wrong you could end up with corrupt data. On top of all that your security team will need to review all the custom code you’ve written- they’d be better off spending the time to review the official Z2JH container images.

Failing that, you could look at something like

I haven’t used it, but others have had some success.

Topic		Replies	Views
Repo2docker for zero to jupyterhub Zero to JupyterHub on Kubernetes	0	382	January 28, 2021
Need Exact documentation to build own Docker image spawnable with Jupyterhub JupyterHub	8	3370	August 13, 2019
How to test k8s-singleuser-sample alone and locally -- without deploying to EKS? Zero to JupyterHub on Kubernetes	17	1726	February 16, 2023
Kubespawner_override images Zero to JupyterHub on Kubernetes question	3	1112	December 28, 2021
Is it possible to get different docker image for each user? JupyterHub	2	1162	April 26, 2019

Rebuild Docker Images based on AWS Linux2

Related topics