I’m trying to optimize Jupyterhub launch speeds by ensuring that some form of our large Data Science Image is always available on Nodes new or Old.
I’m using AWS EC2 Image Builder to produce an AMI that has our large Data Science image baked in. This is done using containerd (ctr) to pull the Image to the k8s.io namespace that EKS uses.
The Image Builder pipeline looks like this:
name: ml-image-pull
description: Pulls the latest ml-image Docker Image.
schemaVersion: 1.0
phases:
- name: build
steps:
- name: pull-ml-image
action: ExecuteBash
inputs:
commands:
- password=$(aws ecr get-login-password --region us-west-2)
- echo "pulling ml-image:latest..."
# Redirecting stdout because the process creates thousands of log lines.
- sudo ctr --namespace k8s.io images pull --user AWS:$password account_id.dkr.ecr.us-west-2.amazonaws.com/ml-image:latest > /dev/null
# This command also has a ton of output which creates noise, so only printing what we want.
- sudo ctr --namespace k8s.io images list | head -n 1
- sudo ctr --namespace k8s.io images list | grep ml-image
- name: test
steps:
- name: confirm-ml-image-pulled
action: ExecuteBash
inputs:
commands:
- set -e
- sudo ctr --namespace k8s.io images list | grep ml-image
This AMI is then launched by Karpenter which is always deploying the newest version of the AMI whenever the Cluster needs to scale
While this reduces the time needed to pull the image (takes 300ms to 20 seconds depending on code changes), it takes almost a minute to load extensions:
Defaulted container "notebook" out of: notebook, block-cloud-metadata (init)
Coiled user token is not set. Skipping login.
[I 2024-08-02 17:56:59.286 SingleUserLabApp mixins:547] Starting jupyterhub single-user server version 4.0.0
[I 2024-08-02 17:56:59.286 SingleUserLabApp mixins:561] Extending jupyterlab.labhubapp.SingleUserLabApp from jupyterlab 3.6.3
[I 2024-08-02 17:56:59.286 SingleUserLabApp mixins:561] Extending jupyter_server.serverapp.ServerApp from jupyter_server 1.23.6
[D 2024-08-02 17:56:59.484 SingleUserLabApp application:190] Searching ['/home/explorer/.config/jupyter', '/python/etc/jupyter', '/usr/local/etc/jupyter', '/etc/xdg/jupyter'] for co
nfig files
[D 2024-08-02 17:56:59.485 SingleUserLabApp application:902] Looking for jupyter_config in /etc/xdg/jupyter
[D 2024-08-02 17:56:59.485 SingleUserLabApp application:902] Looking for jupyter_config in /usr/local/etc/jupyter
[D 2024-08-02 17:56:59.485 SingleUserLabApp application:902] Looking for jupyter_config in /python/etc/jupyter
[D 2024-08-02 17:56:59.485 SingleUserLabApp application:902] Looking for jupyter_config in /home/explorer/.config/jupyter
[D 2024-08-02 17:56:59.486 SingleUserLabApp application:902] Looking for jupyter_server_config in /etc/xdg/jupyter
[D 2024-08-02 17:56:59.486 SingleUserLabApp application:902] Looking for jupyter_server_config in /usr/local/etc/jupyter
[D 2024-08-02 17:56:59.486 SingleUserLabApp application:902] Looking for jupyter_server_config in /python/etc/jupyter
[D 2024-08-02 17:56:59.486 SingleUserLabApp application:902] Looking for jupyter_server_config in /home/explorer/.config/jupyter
[D 2024-08-02 17:56:59.488 SingleUserLabApp config_manager:93] Paths used for configuration of jupyter_server_config:
/etc/xdg/jupyter/jupyter_server_config.json
[D 2024-08-02 17:56:59.488 SingleUserLabApp config_manager:93] Paths used for configuration of jupyter_server_config:
/usr/local/etc/jupyter/jupyter_server_config.json
[D 2024-08-02 17:56:59.488 SingleUserLabApp config_manager:93] Paths used for configuration of jupyter_server_config:
/python/etc/jupyter/jupyter_server_config.d/dask_labextension.json
/python/etc/jupyter/jupyter_server_config.d/jupyter-lsp-jupyter-server.json
/python/etc/jupyter/jupyter_server_config.d/jupyter-server-proxy.json
/python/etc/jupyter/jupyter_server_config.d/jupyter_resource_usage.json
/python/etc/jupyter/jupyter_server_config.d/jupyter_server_fileid.json
/python/etc/jupyter/jupyter_server_config.d/jupyter_server_mathjax.json
/python/etc/jupyter/jupyter_server_config.d/jupyter_server_ydoc.json
/python/etc/jupyter/jupyter_server_config.d/jupyterlab.json
/python/etc/jupyter/jupyter_server_config.d/jupyterlab_git.json
/python/etc/jupyter/jupyter_server_config.d/jupyterlab_link_share.json
/python/etc/jupyter/jupyter_server_config.d/nbclassic.json
/python/etc/jupyter/jupyter_server_config.d/nbdime.json
/python/etc/jupyter/jupyter_server_config.d/notebook_shim.json
/python/etc/jupyter/jupyter_server_config.d/panel-client-jupyter.json
/python/etc/jupyter/jupyter_server_config.d/trame_jupyter_extension.json
/python/etc/jupyter/jupyter_server_config.d/voila.json
/python/etc/jupyter/jupyter_server_config.json
[D 2024-08-02 17:56:59.490 SingleUserLabApp config_manager:93] Paths used for configuration of jupyter_server_config:
/home/explorer/.config/jupyter/jupyter_server_config.json
# NOTE(SMT): This takes 50 seconds
# 17:56.59 ---> 17:57:48
[I 2024-08-02 17:57:48.213 SingleUserLabApp manager:344] dask_labextension | extension was successfully linked.
[I 2024-08-02 17:57:48.213 SingleUserLabApp manager:344] jupyter_lsp | extension was successfully linked.
Compare this to our non custom AMI pods which load the extensions in 2 seconds. Thoughts?
Is it the dask extension specifically?
Is there a way to lazy load the dask extension?
Is there some weirdness due to ctr? Are perhaps the layers not warmed up even though they’re present on the Node?
First, I would suggest you to update your images to use JupyterLab 4 (JupyterLab reached its EOL) and see if you notice the same behaviour. Can you reproduce this with multiple launches?
JL 4 seemed to make things slightly quicker, but, this is still a much slower startup than when the image isn’t pulled with containerd during Image Builder.
[I 2024-08-05 00:46:35.048 ServerApp] Package dask_labextension took 40.5504s to import
Coiled user token is not set. Skipping login.
[D 2024-08-05 00:45:54.158 ServerApp] Searching ['/home/explorer/.config/jupyter', '/python/etc/jupyter', '/usr/local/etc/jupyter', '/etc/xdg/jupyter'] for config files
[D 2024-08-05 00:45:54.158 ServerApp] Looking for jupyter_config in /etc/xdg/jupyter
[D 2024-08-05 00:45:54.158 ServerApp] Looking for jupyter_config in /usr/local/etc/jupyter
[D 2024-08-05 00:45:54.158 ServerApp] Looking for jupyter_config in /python/etc/jupyter
[D 2024-08-05 00:45:54.159 ServerApp] Looking for jupyter_config in /home/explorer/.config/jupyter
[D 2024-08-05 00:45:54.162 ServerApp] Looking for jupyter_server_config in /etc/xdg/jupyter
[D 2024-08-05 00:45:54.162 ServerApp] Looking for jupyter_server_config in /usr/local/etc/jupyter
[D 2024-08-05 00:45:54.162 ServerApp] Looking for jupyter_server_config in /python/etc/jupyter
[D 2024-08-05 00:45:54.162 ServerApp] Looking for jupyter_server_config in /home/explorer/.config/jupyter
[D 2024-08-05 00:45:54.169 ServerApp] Paths used for configuration of jupyter_server_config:
/etc/xdg/jupyter/jupyter_server_config.json
[D 2024-08-05 00:45:54.169 ServerApp] Paths used for configuration of jupyter_server_config:
/usr/local/etc/jupyter/jupyter_server_config.json
[D 2024-08-05 00:45:54.244 ServerApp] Paths used for configuration of jupyter_server_config:
/python/etc/jupyter/jupyter_server_config.d/dask_labextension.json
/python/etc/jupyter/jupyter_server_config.d/jupyter-lsp-jupyter-server.json
/python/etc/jupyter/jupyter_server_config.d/jupyter-server-proxy.json
/python/etc/jupyter/jupyter_server_config.d/jupyter_resource_usage.json
/python/etc/jupyter/jupyter_server_config.d/jupyter_server_mathjax.json
/python/etc/jupyter/jupyter_server_config.d/jupyter_server_terminals.json
/python/etc/jupyter/jupyter_server_config.d/jupyterlab.json
/python/etc/jupyter/jupyter_server_config.d/jupyterlab_git.json
/python/etc/jupyter/jupyter_server_config.d/jupyterlab_link_share.json
/python/etc/jupyter/jupyter_server_config.d/nbdime.json
/python/etc/jupyter/jupyter_server_config.d/notebook_shim.json
/python/etc/jupyter/jupyter_server_config.d/panel-client-jupyter.json
/python/etc/jupyter/jupyter_server_config.d/trame_jupyter_extension.json
/python/etc/jupyter/jupyter_server_config.d/voila.json
/python/etc/jupyter/jupyter_server_config.json
[D 2024-08-05 00:45:54.250 ServerApp] Paths used for configuration of jupyter_server_config:
/home/explorer/.config/jupyter/jupyter_server_config.json
[I 2024-08-05 00:45:54.497 ServerApp] Package jupyterhub took 0.0000s to import
# 40s to import. Perhaps slightly quicker?
[I 2024-08-05 00:46:35.048 ServerApp] Package dask_labextension took 40.5504s to import
[W 2024-08-05 00:46:35.048 ServerApp] A `_jupyter_server_extension_points` function was not found in dask_labextension. Instead, a `_jupyter_server_extension_paths` function was found and will be used for now. This function name will be deprecated in future releases of Jupyter Server.
[I 2024-08-05 00:46:35.088 ServerApp] Package jupyter_lsp took 0.0393s to import
[W 2024-08-05 00:46:35.088 ServerApp] A `_jupyter_server_extension_points` function was not found in jupyter_lsp. Instead, a `_jupyter_server_extension_paths` function was found and will be used for now. This function name will be deprecated in future releases of Jupyter Server.
[I 2024-08-05 00:46:35.209 ServerApp] Package jupyter_resource_usage took 0.1204s to import
[I 2024-08-05 00:46:35.211 ServerApp] Package jupyter_server_mathjax took 0.0022s to import
[I 2024-08-05 00:46:35.211 ServerApp] Package jupyter_server_proxy took 0.0000s to import
[I 2024-08-05 00:46:35.377 ServerApp] Package jupyter_server_terminals took 0.1652s to import
[I 2024-08-05 00:46:36.627 ServerApp] Package jupyterlab took 1.2504s to import
And then here are the logs for a second notebook server using the same image that was scheduled to the same node as the above notebook server. The dask extension loads instantly at this point.
[I 2024-08-05 01:00:19.822 ServerApp] Package dask_labextension took 0.7843s to import
Coiled user token is not set. Skipping login.
[D 2024-08-05 01:00:19.019 ServerApp] Searching ['/home/explorer/.config/jupyter', '/python/etc/jupyter', '/usr/local/etc/jupyter', '/etc/xdg/jupyter'] for config files
[D 2024-08-05 01:00:19.019 ServerApp] Looking for jupyter_config in /etc/xdg/jupyter
[D 2024-08-05 01:00:19.019 ServerApp] Looking for jupyter_config in /usr/local/etc/jupyter
[D 2024-08-05 01:00:19.020 ServerApp] Looking for jupyter_config in /python/etc/jupyter
[D 2024-08-05 01:00:19.020 ServerApp] Looking for jupyter_config in /home/explorer/.config/jupyter
[D 2024-08-05 01:00:19.021 ServerApp] Looking for jupyter_server_config in /etc/xdg/jupyter
[D 2024-08-05 01:00:19.021 ServerApp] Looking for jupyter_server_config in /usr/local/etc/jupyter
[D 2024-08-05 01:00:19.021 ServerApp] Looking for jupyter_server_config in /python/etc/jupyter
[D 2024-08-05 01:00:19.021 ServerApp] Looking for jupyter_server_config in /home/explorer/.config/jupyter
[D 2024-08-05 01:00:19.026 ServerApp] Paths used for configuration of jupyter_server_config:
/etc/xdg/jupyter/jupyter_server_config.json
[D 2024-08-05 01:00:19.026 ServerApp] Paths used for configuration of jupyter_server_config:
/usr/local/etc/jupyter/jupyter_server_config.json
[D 2024-08-05 01:00:19.026 ServerApp] Paths used for configuration of jupyter_server_config:
/python/etc/jupyter/jupyter_server_config.d/dask_labextension.json
/python/etc/jupyter/jupyter_server_config.d/jupyter-lsp-jupyter-server.json
/python/etc/jupyter/jupyter_server_config.d/jupyter-server-proxy.json
/python/etc/jupyter/jupyter_server_config.d/jupyter_resource_usage.json
/python/etc/jupyter/jupyter_server_config.d/jupyter_server_mathjax.json
/python/etc/jupyter/jupyter_server_config.d/jupyter_server_terminals.json
/python/etc/jupyter/jupyter_server_config.d/jupyterlab.json
/python/etc/jupyter/jupyter_server_config.d/jupyterlab_git.json
/python/etc/jupyter/jupyter_server_config.d/jupyterlab_link_share.json
/python/etc/jupyter/jupyter_server_config.d/nbdime.json
/python/etc/jupyter/jupyter_server_config.d/notebook_shim.json
/python/etc/jupyter/jupyter_server_config.d/panel-client-jupyter.json
/python/etc/jupyter/jupyter_server_config.d/trame_jupyter_extension.json
/python/etc/jupyter/jupyter_server_config.d/voila.json
/python/etc/jupyter/jupyter_server_config.json
[D 2024-08-05 01:00:19.027 ServerApp] Paths used for configuration of jupyter_server_config:
/home/explorer/.config/jupyter/jupyter_server_config.json
[I 2024-08-05 01:00:19.038 ServerApp] Package jupyterhub took 0.0000s to import
[I 2024-08-05 01:00:19.822 ServerApp] Package dask_labextension took 0.7843s to import
[W 2024-08-05 01:00:19.822 ServerApp] A `_jupyter_server_extension_points` function was not found in dask_labextension. Instead, a `_jupyter_server_extension_paths` function was found and will be used for now. This function name will be deprecated in future releases of Jupyter Server.
[I 2024-08-05 01:00:19.841 ServerApp] Package jupyter_lsp took 0.0187s to import
[W 2024-08-05 01:00:19.841 ServerApp] A `_jupyter_server_extension_points` function was not found in jupyter_lsp. Instead, a `_jupyter_server_extension_paths` function was found and will be used for now. This function name will be deprecated in future releases of Jupyter Server.
[I 2024-08-05 01:00:19.846 ServerApp] Package jupyter_resource_usage took 0.0044s to import
[I 2024-08-05 01:00:19.847 ServerApp] Package jupyter_server_mathjax took 0.0013s to import
[I 2024-08-05 01:00:19.847 ServerApp] Package jupyter_server_proxy took 0.0000s to import
[I 2024-08-05 01:00:19.858 ServerApp] Package jupyter_server_terminals took 0.0105s to import
[I 2024-08-05 01:00:19.986 ServerApp] Package jupyterlab took 0.1277s to import
Almost seems like the layers need to be warmed up or something?
Almost seems like the layers need to be warmed up or something?
Yes, this seems to me as some hot caching in the underlying filesystem where layers are stored. We have a similar behaviour, albeit, we use Spectrum Scale (GPFS) file system where we installed Jupyter envs. Certain extensions take more time on first load (due to needing to load lot of files) and then they are cached in-memory. Consequently the subsequent reads are super rapid. I see the same behaviour in your case as well.
It’s pretty wild that dask_labextension takes so long, I’m not sure what it could be doing for so long, but presumably it’s mostly in importing dask / dask.distributed itself which probably does a lot of stats / reads from a cold disk. I’m not sure there’s much you can do beyond running a process that does these imports in the image.
Making sure the container runtime’s image storage is on a local mounted SSD may also help.