I was looking for an example configuration or pre-built Docker image that includes NVIDIA RAPIDS already installed? They already provide good install docs, however, following the conda-based instructions here to a try and add these dependencies on top of an existing Jupyter Docker image (even the most minimal) creates an image of over 40GB, too big to successfully build on GH Actions even after some tricks that free additional space!
Fortunately, the RAPIDS team also provides a bunch of pre-built docker images, e.g. nvcr.io/nvidia/rapidsai/base:25.02-cuda12.0-py3.12, which are also much smaller. I think it should be a straight forward task to add the necessary jupyterhub on this image, I tried this,
FROM nvcr.io/nvidia/rapidsai/notebooks:25.02-cuda12.0-py3.12
RUN mamba install -c conda-forge --yes jupyterhub notebook
CMD ["jupyter", "lab", "--ip", "0.0.0.0"]
won’t start up on my Jupyterhub. (The cuda-enabled official jupyter ones work fine, and this one seems to work fine if I just run in docker locally.). What did I miss?
Thanks! Here’s the kubectl events from describe pod, looks like the PostStartHook fails. (not sure what that does. Is there anything that needs to be set as a default entrypoint or default command when making an image?)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 44s testjuypterhelm-user-scheduler Successfully assigned testjupyter/jupyter-cboettig to thelio
Normal Pulled 45s kubelet Container image "quay.io/jupyterhub/k8s-network-tools:4.0.0" already present on machine
Normal Created 45s kubelet Created container block-cloud-metadata
Normal Started 44s kubelet Started container block-cloud-metadata
Normal Pulled 43s kubelet Successfully pulled image "ghcr.io/rocker-org/rapids@sha256:c508daa3143add94f8a6af878f4dc3e62a36166373dbacb771a51a0fff35bc7a" in 518ms (518ms including waiting)
Normal Pulled 42s kubelet Successfully pulled image "ghcr.io/rocker-org/rapids@sha256:c508daa3143add94f8a6af878f4dc3e62a36166373dbacb771a51a0fff35bc7a" in 525ms (525ms including waiting)
Normal Pulled 28s kubelet Successfully pulled image "ghcr.io/rocker-org/rapids@sha256:c508daa3143add94f8a6af878f4dc3e62a36166373dbacb771a51a0fff35bc7a" in 608ms (608ms including waiting)
Normal Created 28s (x3 over 43s) kubelet Created container notebook
Normal Started 28s (x3 over 43s) kubelet Started container notebook
Warning FailedPostStartHook 28s (x3 over 43s) kubelet PostStartHook failed
Normal Killing 28s (x3 over 43s) kubelet FailedPostStartHook
Warning BackOff 15s (x3 over 42s) kubelet Back-off restarting failed container notebook in pod jupyter-cboettig_testjupyter(c0ff394d-ac48-4603-87a1-9f3ac231870a)
I did have a postStart config, but I think that’s unrelated. I just removed the postStart task and still the image won’t launch,
---- ------ ---- ---- -------
Normal Scheduled 32s testjuypterhelm-user-scheduler Successfully assigned testjupyter/jupyter-cboettig to thelio
Normal Pulled 32s kubelet Container image "quay.io/jupyterhub/k8s-network-tools:4.0.0" already present on machine
Normal Created 32s kubelet Created container block-cloud-metadata
Normal Started 32s kubelet Started container block-cloud-metadata
Normal Pulled 31s kubelet Successfully pulled image "ghcr.io/rocker-org/rapids:latest" in 541ms (541ms including waiting)
Normal Pulled 30s kubelet Successfully pulled image "ghcr.io/rocker-org/rapids:latest" in 531ms (531ms including waiting)
Normal Pulling 16s (x3 over 31s) kubelet Pulling image "ghcr.io/rocker-org/rapids:latest"
Normal Pulled 16s kubelet Successfully pulled image "ghcr.io/rocker-org/rapids:latest" in 485ms (485ms including waiting)
Normal Created 15s (x3 over 31s) kubelet Created container notebook
Normal Started 15s (x3 over 31s) kubelet Started container notebook
Warning BackOff 3s (x3 over 29s) kubelet Back-off restarting failed container notebook in pod jupyter-cboettig_testjupyter(90e614a9-b60a-4ce9-8966-a3774efe22cc)
Thanks @manics , the pod doesn’t start so it doesn’t have logs (waiting for pod creation). I have pasted the event logs from kubectl describe pod in the previous reply.
Just to double-check, is there anything other than installing jupyterhub and notebook conda packages in the default conda environment that should be necessary here?
Starting single user server is bit more involved that running jupyter lab --ip 0.0.0.0 as you are doing in your image. Refer the base image Dockerfile to see how it is done. You can try replacing CMD with ["jupyterhub-singleuser"] to see if it works.
Can anyone point me to the documentation about what is required for a Docker image to be compatible to start on JupyterHub?
For instance, here is a relatively minimal Dockerfile: it merely installs jupyterhub 4.* and notebook on top of the micromaba default image. This works just fine on deploying on Jupyterhub (using the 'bring my own docker image" option).
Note that this minimal example works just fine with or without the CMD, I believe the z2jh config provides the necessary default CMD on startup already.
Not sure why the NVIDIA RAPIDS image cannot start up though or how to debug. Is anyone else able to test that image?
I think the only requirement is that a compatible version of jupyterhub-singleuser is in the container’s default PATH. What happens if you docker run ... IMAGE jupyterhub-singleuser --help?
ah ha! apparently I was just being either too slow or too fast to get the kubectl logs to show up. Was able to run that again and get logs this time, which shows a permission issue on their custom entrypoint file there! Sorry for the noise and thanks all for the help. Testing now but this looks promising.
Sounds like either your Python installation has been corrupted (perhaps by installing incompatible packages?), or something’s gone wrong with your Python related paths which means it’s looking in the wrong place for system modules. I’ve no idea how that happened though.
Do you have the Dockerfile from Nvidia used to build the original base image?