Hello @mahendrapaipuri, many thanks for your reply.
I’ve decided to give the existing SlurmSpawner a try. Using Slurm’s REST API indeed looks technically cleaner to me, but is currently still associated with more effort.
I was able to get the SlurmSpawner up and running inside Kubernetes by opening an SSH-Connection to the login node with the “exec_prefix”. In doing so, I encountered a few hurdles that I was able to solve with the help of existing workarounds. I’ve needed to build a custom Hub-Image with ssh, wrapspawner and batchspawner installed. For anyone interested in doing the same, I’ve put my learnings below. 
Parts of the HelmChart Config
hub:
image:
# Custom jupyterhub image with ssh, wrapspawner and batchspawner installed.
name: myregistry.example.com/k8s-hub-custom
extraFiles:
# SSH Private Key to connect to HPC login node - just as proof of concept.
#
# (!) PLEASE NOTE THAT THIS IS A SECURITY RISK IF THE KEY IS NOT PROTECTED
# PROPERLY. THIS IS JUST AN PROOF OF CONCEPT.
00-ssh-key:
mountPath: /id_ed25519
mode: 0400
stringData: |
-----BEGIN OPENSSH PRIVATE KEY-----
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
-----END OPENSSH PRIVATE KEY-----
extraConfig:
00-global: |
c.JupyterHub.spawner_class = 'wrapspawner.ProfilesSpawner'
c.JupyterHub.log_level = 'DEBUG'
c.Spawner.http_timeout = 180
c.Spawner.start_timeout = 300
01-batchspawner-slurm: |
import batchspawner
# WORKAROUND: No local slurm users in hub pod, which leads to error
# with calling pwd.getpwnam(self.user.name).pw_dir in _req_homedir_default function.
#
# Patch SlurmSpawner to not require local users.
# Thx to https://gist.github.com/zonca/55f7949983e56088186e99db53548ded
#
class SlurmSpawnerNoLocalUsers(batchspawner.SlurmSpawner):
def user_env(self, env):
"""get user environment"""
env['USER'] = self.user.name
return env
def _req_homedir_default(self):
return "/home/{}/".format(self.user.name)
# WORKAROUND: Environment variables are not passed to the spawned process with ssh
#
# Thx to https://github.com/jupyterhub/batchspawner/issues/123
#
c.SlurmSpawner.batch_submit_cmd = " ".join(
[
"env", "{% for var in keepvars.split(',') %}{{var}}=\"'${{'{'}}{{var}}{{'}'}}'\" {% endfor %}",
"sbatch --parsable",
]
)
# WORKAROUND: Error: squeue: error: Unrecognized option: %B
#
# Take care of quotion in squeue command in combination with exec_prefix.
# Thx to https://github.com/jupyterhub/batchspawner/issues/123#issuecomment-2157902069
#
c.SlurmSpawner.batch_query_cmd = "squeue -h -j {job_id} -o \"'%T %B'\""
#
# Slurm settings
#
c.SlurmSpawner.exec_prefix = "ssh -o StrictHostKeyChecking=accept-new -i /id_ed25519 MYSSHUSER@login-node.example.com"
c.SlurmSpawner.batch_script = '''#!/bin/bash
#SBATCH --output=/home/{username}/jupyterhub_slurmspawner_%j.log
#SBATCH --job-name=spawner-jupyterhub
#SBATCH --chdir=/home/{username}
#SBATCH --export=HOME,PATH,JUPYTERHUB_API_TOKEN,JPY_API_TOKEN,JUPYTERHUB_CLIENT_ID,JUPYTERHUB_COOKIE_HOST_PREFIX_ENABLED,JUPYTERHUB_HOST,JUPYTERHUB_OAUTH_CALLBACK_URL,JUPYTERHUB_OAUTH_SCOPES,JUPYTERHUB_OAUTH_ACCESS_SCOPES,JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES,JUPYTERHUB_USER,JUPYTERHUB_SERVER_NAME,JUPYTERHUB_API_URL,JUPYTERHUB_ACTIVITY_URL,JUPYTERHUB_BASE_URL,JUPYTERHUB_SERVICE_PREFIX,JUPYTERHUB_SERVICE_URL,JUPYTERHUB_PUBLIC_URL,JUPYTERHUB_PUBLIC_HUB_URL,USER,HOME,SHELL
#SBATCH --get-user-env=L
echo "***************************************************************"
hostname
ml load Python
python3 -m venv ./venv_jupyterhub
source ./venv_jupyterhub/bin/activate
pip3 install batchspawner
pip3 install jupyterhub
pip3 install jupyterlab
pip3 install jupyter-server
echo "***************************************************************"
# Most likely the following modifications are not needed, if the Hub can directly communicate
# with the compute nodes and vice versa. Unfortunately, this is not yet the case in our environment and I had to do SSH port forwarding magic in the background
export JUPYTERHUB_API_URL=http://login-node.example.com:8085/hub/api
export JUPYTERHUB_ACTIVITY_URL=http://login-node.example.com:8085/hub/api/users/{username}/activity
export JUPYTERHUB_SERVICE_URL=http://localhost:19999
echo "***************************************************************"
which batchspawner-singleuser
which jupyterhub-singleuser
env
echo $HOME
echo $PATH
echo "***************************************************************"
batchspawner-singleuser jupyterhub-singleuser --debug --ServerApp.port=9999
'''
10-profilespawner: |
# WORKAROUND: Stuck in "Started container notebook"
#
# KubeSpawner with ProfilesSpawner leads to hanging "Started container notebook".
# Thy to https://github.com/jupyterhub/wrapspawner/issues/58#issuecomment-1882918661, I've took a
# further look into the variables. In contrast to the issue I had to set cmd to ['jupyterhub-singleuser']
#
# NOTE:
#
# A profile in ProfilesSpawner.profiles is a tuple with the following parameters:
#
# 1. display_name: The name of the profile which is shown in the dropdown menu.
# 2. name: The name of the profile which is used in the singleuser.profileList.
# 3. spawner_class: The class of the spawner which is used in the singleuser.profileList.
# 4. kwargs: The parameters which are passed to the spawner class in the singleuser.profileList.
#
# Using the KubeSpawner please be aware of the 'name'-paramneter (the second one). It must a match of a profile
# in the singleuser.profileList. If a profile in the singleuser.profileList does not container a name, than the
# display_name is used as name without spaces and all lowerkeys.
# E. g. display_name: "GPU Node" will be used as name: "gpunode".
#
c.ProfilesSpawner.profiles = [
('K8S - Default', 'default', 'kubespawner.KubeSpawner', {'ip':'0.0.0.0', 'port': 0, 'cmd': ['jupyterhub-singleuser']}),
('K8S - GPU-Node', 'gpu-node', 'kubespawner.KubeSpawner', {'ip':'0.0.0.0', 'port': 0, 'cmd': ['jupyterhub-singleuser']}),
('HPC - Partition XY (2 cores, 4 GB, 8 hours)', 'singleuser', SlurmSpawnerNoLocalUsers, dict(req_partition='parition.xy', req_nprocs='2', req_memory='4gb', req_runtime='8:00:00'))
]
singleuser:
# Profiles for Kubernetes Spawner
profileList:
- display_name: "Default"
description: "Your code will run on a shared machine with CPU only."
default: True
- display_name: "GPU-Node"
description: "Spawns a notebook server with access to a GPU"
kubespawner_override:
extra_resource_limits:
nvidia.com/gpu: "1"
Modified jupyterhub image with ssh, wrapspawner and batchspawner installed
FROM quay.io/jupyterhub/k8s-hub:4.2.0
USER root
RUN export DEBIAN_FRONTEND=noninteractive \
&& apt update \
&& apt install -y openssh-client
USER jovyan
RUN pip3 install wrapspawner batchspawner
Thanks again,
Martin