Separating JupyterHub and Slurm using LXC containers

I have a Slurm cluster that’s successfully setup and configured, and I’m trying to add JupyterHub to the cluster so that users have different ways to interact with the cluster.

Currently, I use LXC containers to separate different aspects of the cluster. Specifically, the login node of the cluster is in a separate container from the compute node of the cluster. I want to implement JupyterHub (with batchspawner) in another container separate from both the login and compute containers. Then using SSH, JupyterHub can submit jobs to the login node to be scheduled to run.

I was able to get JupyterHub running, and I can login and specify resources to be submitted to the cluster. But when I try to start the server, I just keep getting the error: Spawn failed: The Jupyter batch job has disappeared while pending in the queue or died immediately after starting.

JupyterHub appears to be able to SSH and submit the job because I get a log file from Slurm, but the log file says:
/var/spool/slurm/d/job00033/slurm_script: line 17: batchspawner-singleuser: command not found

Looking at syslog, I see some errors that seem to suggest maybe the way I have the SSH submit command setup is wrong:

This is the relevant part of my jupyterhub_config.py script:

c.SlurmSpawner.exec_prefix = "ssh loginnode sudo -E -u {username}"
c.SlurmSpawner.batch_script = """#!/bin/bash
#SBATCH --partition=debug
#SBATCH --cpus-per-task={cpu_cores}
#SBATCH --mem={memory}G
#SBATCH -t 0-{runtime}:00
#SBATCH --output={homedir}/jupyterhub_slurmspawner_%j.log
#SBATCH --job-name=spawner-jupyterhub
#SBATCH --chdir={homedir}
#SBATCH --export={keepvars}
#SBATCH --get-user-env=L
#SBATCH {options}
hostname
whoami
which jupyterhub-singleuser
which {cmd}
echo {cmd}
{cmd}

echo "jupyterhub-singleuser ended gracefully"
"""

I feel like there may be multiple errors that I’m making - one with the SSH command, and maybe another with the conda environment I setup on the compute node.

Any help and tips would be greatly appreciated!

So I managed to fix the error about the squeue command not recognizing the option %B: I figured out that it was the ssh command not escaping the single quotes in c.SlurmSpawner.batch_query_cmd so I just changed it to c.SlurmSpawner.batch_query_cmd = "\"squeue -h -j {job_id} -o '%T %B'\"" and it no longer throws an error.

I also fixed the batchspawner-singleuser command not found error (although I don’t know if this is the right way to do it), I just copied the batchspawner-singleuser file from my Jupyterhub virtual environment at /opt/jupyterhub/bin on my compute node to /usr/local/bin and it can find it fine now.

But now I’m getting a different error:

Jul 22 21:21:12 jupyterhub-container jupyterhub[730]: [E 2024-07-22 21:21:12.136 JupyterHub user:1002] Unhandled error starting wk5ng's server: The Jupyter batch job started but died before launching the single-user server.
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]:     Traceback (most recent call last):
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]:       File "/opt/jupyterhub/lib/python3.10/site-packages/jupyterhub/user.py", line 916, in spawn
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]:         url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]:       File "/opt/jupyterhub/lib/python3.10/site-packages/batchspawner/batchspawner.py", line 456, in start
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]:         raise RuntimeError(
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]:     RuntimeError: The Jupyter batch job started but died before launching the single-user server.
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]:     
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]: [D 2024-07-22 21:21:12.141 JupyterHub user:1095] Stopping wk5ng
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]: [D 2024-07-22 21:21:12.175 JupyterHub user:1117] Deleting oauth client jupyterhub-user-wk5ng
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]: [D 2024-07-22 21:21:12.191 JupyterHub user:1120] Finished stopping wk5ng
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]: [E 2024-07-22 21:21:12.201 JupyterHub gen:629] Exception in Future <Task finished name='Task-178' coro=<BaseHandler.spawn_single_user.<locals>.finish_user_spawn() done, defined at /opt/jupyterhub/lib/python3.10/site-packages/jupyterhub/handlers/base.py:1115> exception=RuntimeError('The Jupyter batch job started but died before launching the single-user server.')> after timeout
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]:     Traceback (most recent call last):
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]:       File "/opt/jupyterhub/lib/python3.10/site-packages/tornado/gen.py", line 624, in error_callback
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]:         future.result()
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]:       File "/opt/jupyterhub/lib/python3.10/site-packages/jupyterhub/handlers/base.py", line 1122, in finish_user_spawn
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]:         await spawn_future
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]:       File "/opt/jupyterhub/lib/python3.10/site-packages/jupyterhub/user.py", line 1016, in spawn
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]:         raise e
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]:       File "/opt/jupyterhub/lib/python3.10/site-packages/jupyterhub/user.py", line 916, in spawn
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]:         url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]:       File "/opt/jupyterhub/lib/python3.10/site-packages/batchspawner/batchspawner.py", line 456, in start
Jul 22 21:21:12 jupyterhub-container jupyterhub[730]: [I 2024-07-22 21:21:12.205 JupyterHub log:192] 200 GET /hub/api/users/wk5ng/server/progress?_xsrf=[secret] (wk5ng@10.23.71.1) 5610.84ms

and the Slurm log looks something like this

Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.11/site-packages/urllib3/connection.py", line 203, in _new_conn
    sock = connection.create_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
    raise err
  File "/opt/anaconda3/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.11/site-packages/urllib3/connectionpool.py", line 791, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/urllib3/connectionpool.py", line 497, in _make_request
    conn.request(
  File "/opt/anaconda3/lib/python3.11/site-packages/urllib3/connection.py", line 395, in request
    self.endheaders()
  File "/opt/anaconda3/lib/python3.11/http/client.py", line 1289, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/opt/anaconda3/lib/python3.11/http/client.py", line 1048, in _send_output
    self.send(msg)
  File "/opt/anaconda3/lib/python3.11/http/client.py", line 986, in send
    self.connect()
  File "/opt/anaconda3/lib/python3.11/site-packages/urllib3/connection.py", line 243, in connect
    self.sock = self._new_conn()
                ^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/urllib3/connection.py", line 218, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x14e529548690>: Failed to establish a new connection: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.11/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/urllib3/connectionpool.py", line 845, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/urllib3/util/retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=8081): Max retries exceeded with url: /hub/api/batchspawner (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x14e529548690>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/batchspawner-singleuser", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/batchspawner/singleuser.py", line 26, in main
    requests.post(
  File "/opt/anaconda3/lib/python3.11/site-packages/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/requests/adapters.py", line 519, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8081): Max retries exceeded with url: /hub/api/batchspawner (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x14e529548690>: Failed to establish a new connection: [Errno 111] Connection refused'))

Any help would be greatly appreciated, and I can send whatever logs/information you might need to help me with debugging this problem!

JupyterHub API URL must run on a network interface that is reachable by all your compute nodes. Basically you will need to have bidirectional communication between hub container and your compute nodes containers. From the errors, it seems that your Hub API is running on localhost and that is why the compute nodes are not able to reach hub.

Could you post your entire JupyterHub config? And all the logs right from the start of JupyterHub?

1 Like

My config :

c = get_config()
import batchspawner

c.Spawner.start_timeout=1200
c.Spawner.http_timeout = 1200
c.JupyterHub.ip = '10.23.71.64'
c.JupyterHub.port = 443
c.JupyterHub.hub_port = 5432

c.Authenticator.allow_all = True
c.JupyterHub.allow_named_servers = True
c.JupyterHub.named_server_limit_per_user = 3
c.ProfileSpawner.ip='0.0.0.0'

import shutil
from jupyter_client.localinterfaces import public_ips
class SlurmSpawner(batchspawner.SlurmSpawner):
    def _options_form_default(self):
        return """
        <label for="cpu_cores">Number of CPU cores:</label>
        <input name="cpu_cores" type="number" min="1" max="12" required step="1" value="1">
        <br>
        <label for="memory">Memory (in GB):</label>
        <input name="memory" type="number" min="1" max="64" required step="1" value="1">
        <br>
        <label for="runtime">Time (Hours):</label>
        <input name="runtime" type="number" min="1" required step="1" value="1">
        """

    def options_from_form(self, formdata):
        options = {}
        options['cpu_cores'] = formdata['cpu_cores'][0]
        options['memory'] = formdata['memory'][0]
        options['runtime'] = formdata['runtime'][0]
        return options

    def _expand_user_properties(self, template):
        user_dict = self.user.to_dict()
        return template.format(**user_dict)

    def _build_env(self):
        env = super()._build_env()
        env['SLURM_CPUS_PER_TASK'] = str(self.user_options['cpu_cores'])
        env['SLURM_MEM_PER_CPU'] = str(int(self.user_options['memory'] * 1024)) + 'M'
        env['SLURM_RUNTIME_PER_TASK'] = str(self.user_options['runtime'])
        return env

    def start(self):
        self.user_options = self.user_options or {}
        return super().start()

c.JupyterHub.spawner_class = SlurmSpawner

c.JupyterHub.tornado_settings = {
        "slow_spawn_timeout": 0,
        "cookie_options": {
          "expires_days": 3
        }
}
c.JupyterHub.cookie_max_age_days = 3


c.SlurmSpawner.exec_prefix = "ssh -v loginnode sudo -E -u {username}"
c.SlurmSpawner.batch_query_cmd = "\"squeue -h -j {job_id} -o '%T %B'\""
c.SlurmSpawner.batch_script = """#!/bin/bash
#SBATCH --partition=debug
#SBATCH --cpus-per-task={cpu_cores}
#SBATCH --mem={memory}G
#SBATCH -t 0-{runtime}:00
#SBATCH --output={homedir}/jupyterhub_slurmspawner_%j.log
#SBATCH --job-name=spawner-jupyterhub
#SBATCH --chdir={homedir}
#SBATCH --export={keepvars}
#SBATCH --get-user-env=L
#SBATCH {options}
hostname
whoami
which jupyterhub-singleuser
which {cmd}
echo {cmd}
{cmd}

echo "jupyterhub-singleuser ended gracefully"
"""

The IP address that I put for c.JupyterHub.ip is the IP address of the container running JupyterHub (an internal address automatically created by LXC when I created the container I think).

And these are the logs when I start JupyterHub:

Jul 23 15:03:00 jupyterhub-container systemd[1]: Started JupyterHub.
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.285 JupyterHub application:908] Looking for /etc/jupyterhub/jupyterhub_config in /
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.323 JupyterHub application:929] Loaded config file: /etc/jupyterhub/jupyterhub_config.py
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:03:01.324 JupyterHub app:3286] Running JupyterHub version 5.0.0
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:03:01.324 JupyterHub app:3316] Using Authenticator: jupyterhub.auth.PAMAuthenticator-5.0.0
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:03:01.324 JupyterHub app:3316] Using Spawner: builtins.SlurmSpawner
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:03:01.325 JupyterHub app:3316] Using Proxy: jupyterhub.proxy.ConfigurableHTTPProxy-5.0.0
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.327 JupyterHub app:3246] Could not load pycurl: No module named 'pycurl'
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]:     pycurl is recommended if you have a large number of users.
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:03:01.327 JupyterHub app:1817] Loading cookie_secret from /jupyterhub_cookie_secret
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.328 JupyterHub app:1984] Connecting to db: sqlite:///jupyterhub.sqlite
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.343 JupyterHub orm:1510] database schema version found: 4621fec11365
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:03:01.395 JupyterHub proxy:556] Generating new CONFIGPROXY_AUTH_TOKEN
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.396 JupyterHub app:2291] Loading roles into database
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.412 JupyterHub app:2638] Purging expired APITokens
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.415 JupyterHub app:2638] Purging expired OAuthCodes
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.416 JupyterHub app:2638] Purging expired Shares
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.418 JupyterHub app:2638] Purging expired ShareCodes
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.419 JupyterHub app:2412] Loading role assignments from config
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.433 JupyterHub app:2923] Initializing spawners
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.440 JupyterHub app:3061] Loaded users:
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]:     
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:03:01.441 JupyterHub app:3355] Initialized 0 spawners in 0.008 seconds
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:03:01.445 JupyterHub metrics:371] Found 1 active users in the last ActiveUserPeriods.twenty_four_hours
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:03:01.446 JupyterHub metrics:371] Found 1 active users in the last ActiveUserPeriods.seven_days
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:03:01.446 JupyterHub metrics:371] Found 1 active users in the last ActiveUserPeriods.thirty_days
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [W 2024-07-23 15:03:01.447 JupyterHub proxy:748] Running JupyterHub without SSL.  I hope there is SSL termination happening somewhere else...
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:03:01.447 JupyterHub proxy:752] Starting proxy @ http://10.23.71.64:443/
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.447 JupyterHub proxy:753] Proxy cmd: ['configurable-http-proxy', '--ip', '10.23.71.64', '--port', '443', '--api-ip', '127.0.0.1', '--api-port', '8001', '--error-target', 'http://127.0.0.1:5432/hub/error', '--log-level', 'info']
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.448 JupyterHub proxy:670] Writing proxy pid file: jupyterhub-proxy.pid
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.448 JupyterHub utils:272] Waiting 10s for server at 10.23.71.64:443
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.449 JupyterHub utils:119] Server at 10.23.71.64:443 not ready: [Errno 111] Connection refused
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.449 JupyterHub utils:272] Waiting 10s for server at 127.0.0.1:8001
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.449 JupyterHub utils:119] Server at 127.0.0.1:8001 not ready: [Errno 111] Connection refused
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.550 JupyterHub utils:119] Server at 127.0.0.1:8001 not ready: [Errno 111] Connection refused
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.618 JupyterHub utils:119] Server at 10.23.71.64:443 not ready: [Errno 111] Connection refused
Jul 23 15:03:01 jupyterhub-container jupyterhub[1620]: 15:03:01.648 [ConfigProxy] #033[32minfo#033[39m: Proxying http://10.23.71.64:443 to (no default)
Jul 23 15:03:01 jupyterhub-container jupyterhub[1620]: 15:03:01.651 [ConfigProxy] #033[32minfo#033[39m: Proxy API at http://127.0.0.1:8001/api/routes
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.873 JupyterHub utils:280] Server at 10.23.71.64:443 responded in 0.42s
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.900 JupyterHub utils:280] Server at 127.0.0.1:8001 responded in 0.45s
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.901 JupyterHub proxy:832] Proxy started and appears to be up
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.905 JupyterHub proxy:925] Proxy: Fetching GET http://127.0.0.1:8001/api/routes
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:03:01.923 JupyterHub app:3669] Hub API listening on http://127.0.0.1:5432/hub/
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.924 JupyterHub proxy:389] Fetching routes to check
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.924 JupyterHub proxy:925] Proxy: Fetching GET http://127.0.0.1:8001/api/routes
Jul 23 15:03:01 jupyterhub-container jupyterhub[1620]: 15:03:01.929 [ConfigProxy] #033[32minfo#033[39m: 200 GET /api/routes
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.932 JupyterHub proxy:392] Checking routes
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:03:01.932 JupyterHub proxy:477] Adding route for Hub: / => http://127.0.0.1:5432
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.933 JupyterHub proxy:925] Proxy: Fetching POST http://127.0.0.1:8001/api/routes/
Jul 23 15:03:01 jupyterhub-container jupyterhub[1620]: 15:03:01.935 [ConfigProxy] #033[32minfo#033[39m: 200 GET /api/routes
Jul 23 15:03:01 jupyterhub-container jupyterhub[1620]: 15:03:01.937 [ConfigProxy] #033[32minfo#033[39m: Adding route / -> http://127.0.0.1:5432
Jul 23 15:03:01 jupyterhub-container jupyterhub[1620]: 15:03:01.938 [ConfigProxy] #033[32minfo#033[39m: Route added / -> http://127.0.0.1:5432
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:03:01.940 JupyterHub app:3710] JupyterHub is now running at http://10.23.71.64:443/
Jul 23 15:03:01 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:03:01.941 JupyterHub app:3279] It took 0.659 seconds for the Hub to start
Jul 23 15:03:01 jupyterhub-container jupyterhub[1620]: 15:03:01.942 [ConfigProxy] #033[32minfo#033[39m: 201 POST /api/routes/

And these are the logs when I login and try to submit a job:

ul 23 15:05:44 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:44.455 JupyterHub base:411] Refreshing auth for ABCD
Jul 23 15:05:44 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:44.458 JupyterHub scopes:1010] Checking access to /hub/spawn-pending/wk5ng via scope servers!server=ABCD/
Jul 23 15:05:44 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:44.458 JupyterHub user:496] Creating <class 'SlurmSpawner'> for ABCD:
Jul 23 15:05:44 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:05:44.490 JupyterHub log:192] 200 GET /hub/spawn-pending/ABCD?_xsrf=[secret] (wk5ng@10.23.71.1) 79.53ms
Jul 23 15:05:44 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:44.558 JupyterHub log:192] 200 GET /hub/static/js/not_running.js?v=20240723150301 (@10.23.71.1) 8.41ms
Jul 23 15:05:44 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:44.571 JupyterHub log:192] 200 GET /hub/static/js/utils.js?v=20240723150301 (@10.23.71.1) 2.12ms
Jul 23 15:05:45 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:45.565 JupyterHub scopes:1010] Checking access to /hub/spawn/ABCD via scope servers!server=ABCD/
Jul 23 15:05:45 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:45.567 JupyterHub pages:208] Serving options form for ABCD
Jul 23 15:05:45 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:05:45.581 JupyterHub log:192] 200 GET /hub/spawn/ABCD (ABCD@10.23.71.1) 25.67ms
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:46.738 JupyterHub scopes:1010] Checking access to /hub/spawn/ABCD via scope servers!server=ABCD/
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:46.739 JupyterHub pages:256] Triggering spawn with supplied form options for ABCD
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:46.740 JupyterHub base:1095] Initiating spawn for ABCD
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:46.741 JupyterHub base:1099] 0/100 concurrent spawns
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:46.741 JupyterHub base:1104] 0 active servers
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:05:46.812 JupyterHub provider:661] Creating oauth client jupyterhub-user-wk5ng
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:46.843 JupyterHub user:912] Calling Spawner.start for ABCD
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:05:46.844 JupyterHub batchspawner:281] Spawner script options: {'account': '', 'cluster': '', 'epilogue': '', 'gres': '', 'homedir': '/home/ABCD', 'host': '', 'keepvars': 'PATH,LANG,JUPYTERHUB_API_TOKEN,JPY_API_TOKEN,JUPYTERHUB_CLIENT_ID,JUPYTERHUB_COOKIE_OPTIONS,JUPYTERHUB_COOKIE_HOST_PREFIX_ENABLED,JUPYTERHUB_HOST,JUPYTERHUB_OAUTH_CALLBACK_URL,JUPYTERHUB_OAUTH_SCOPES,JUPYTERHUB_OAUTH_ACCESS_SCOPES,JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES,JUPYTERHUB_USER,JUPYTERHUB_SERVER_NAME,JUPYTERHUB_API_URL,JUPYTERHUB_ACTIVITY_URL,JUPYTERHUB_BASE_URL,JUPYTERHUB_SERVICE_PREFIX,JUPYTERHUB_SERVICE_URL,JUPYTERHUB_PUBLIC_URL,JUPYTERHUB_PUBLIC_HUB_URL,USER,HOME,SHELL', 'keepvars_extra': '', 'memory': '1', 'ngpus': '', 'nprocs': '', 'options': '', 'partition': '', 'prologue': '', 'qos': '', 'queue': '', 'reservation': '', 'runtime': '1', 'srun': 'srun', 'username': 'ABCD', 'cmd': 'batchspawner-singleuser jupyterhub-singleuser', 'cpu_cores': '1'}
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:05:46.845 JupyterHub batchspawner:282] Spawner submitting command: ssh -v loginnode sudo -E -u ABCD sbatch --parsable
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:46.845 JupyterHub batchspawner:283] Spawner submitting script:
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]:     #!/bin/bash
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]:     #SBATCH --partition=debug
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]:     #SBATCH --cpus-per-task=1
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]:     #SBATCH --mem=1G
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]:     #SBATCH -t 0-1:00
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]:     #SBATCH --output=/home/ABCD/jupyterhub_slurmspawner_%j.log
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]:     #SBATCH --job-name=spawner-jupyterhub
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]:     #SBATCH --chdir=/home/ABCD
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]:     #SBATCH --export=PATH,LANG,JUPYTERHUB_API_TOKEN,JPY_API_TOKEN,JUPYTERHUB_CLIENT_ID,JUPYTERHUB_COOKIE_OPTIONS,JUPYTERHUB_COOKIE_HOST_PREFIX_ENABLED,JUPYTERHUB_HOST,JUPYTERHUB_OAUTH_CALLBACK_URL,JUPYTERHUB_OAUTH_SCOPES,JUPYTERHUB_OAUTH_ACCESS_SCOPES,JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES,JUPYTERHUB_USER,JUPYTERHUB_SERVER_NAME,JUPYTERHUB_API_URL,JUPYTERHUB_ACTIVITY_URL,JUPYTERHUB_BASE_URL,JUPYTERHUB_SERVICE_PREFIX,JUPYTERHUB_SERVICE_URL,JUPYTERHUB_PUBLIC_URL,JUPYTERHUB_PUBLIC_HUB_URL,USER,HOME,SHELL
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]:     #SBATCH --get-user-env=L
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:46.847 JupyterHub batchspawner:284] Spawner submitting environment: {'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin', 'LANG': 'C.UTF-8', 'JUPYTERHUB_API_TOKEN': '77c3879365734b64bc434ebcf8e021e3', 'JPY_API_TOKEN': '77c3879365734b64bc434ebcf8e021e3', 'JUPYTERHUB_CLIENT_ID': 'jupyterhub-user-wk5ng', 'JUPYTERHUB_COOKIE_OPTIONS': '{"expires_days": 3}', 'JUPYTERHUB_COOKIE_HOST_PREFIX_ENABLED': '0', 'JUPYTERHUB_HOST': '', 'JUPYTERHUB_OAUTH_CALLBACK_URL': '/user/wk5ng/oauth_callback', 'JUPYTERHUB_OAUTH_SCOPES': '["access:servers!server=wk5ng/", "access:servers!user=ABCD"]', 'JUPYTERHUB_OAUTH_ACCESS_SCOPES': '["access:servers!server=wk5ng/", "access:servers!user=ABCD"]', 'JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES': '[]', 'JUPYTERHUB_USER': 'ABCD', 'JUPYTERHUB_SERVER_NAME': '', 'JUPYTERHUB_API_URL': 'http://127.0.0.1:5432/hub/api', 'JUPYTERHUB_ACTIVITY_URL': 'http://127.0.0.1:5432/hub/api/users/ABCD/activity', 'JUPYTERHUB_BASE_URL': '/', 'JUPYTERHUB_SERVICE_PREFIX': '/user/ABCD/', 'JUPYTERHUB_SERVICE_URL': 'http://0.0.0.0:0/user/ABCD/', 'JUPYTERHUB_PUBLIC_URL': '', 'JUPYTERHUB_PUBLIC_HUB_URL': '', 'USER': 'ABCD', 'HOME': '/home/ABCD', 'SHELL': '/bin/bash'}
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]: [W 2024-07-23 15:05:46.851 JupyterHub base:201] Rolling back dirty objects IdentitySet([<Server(:0)>])
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:05:46.870 JupyterHub log:192] 302 POST /hub/spawn/ABCD?_xsrf=[secret] -> /hub/spawn-pending/ABCD?_xsrf=[secret] (ABCD@10.23.71.1) 124.95ms
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:46.877 JupyterHub scopes:1010] Checking access to /hub/spawn-pending/ABCD via scope servers!server=ABCD/
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:05:46.886 JupyterHub pages:397] ABCD is pending spawn
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:05:46.900 JupyterHub log:192] 200 GET /hub/spawn-pending/ABCD?_xsrf=[secret] (ABCD@10.23.71.1) 28.19ms
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]: [W 2024-07-23 15:05:46.981 JupyterHub _xsrf_utils:195] Skipping XSRF check for insecure request GET /hub/api/users/wk5ng/server/progress
Jul 23 15:05:46 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:46.983 JupyterHub scopes:1010] Checking access to /hub/api/users/ABCD/server/progress via scope read:servers!server=ABCD/
Jul 23 15:05:47 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:05:47.525 JupyterHub batchspawner:287] Job submitted. output: 41
Jul 23 15:05:47 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:47.528 JupyterHub batchspawner:314] Spawner querying job: ssh -v loginnode sudo -E -u ABCD "squeue -h -j 41 -o '%T %B'"
Jul 23 15:05:48 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:48.330 JupyterHub batchspawner:314] Spawner querying job: ssh -v loginnode sudo -E -u ABCD "squeue -h -j 41 -o '%T %B'"
Jul 23 15:05:49 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:49.101 JupyterHub batchspawner:314] Spawner querying job: ssh -v loginnode sudo -E -u ABCD "squeue -h -j 41 -o '%T %B'"
Jul 23 15:05:49 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:49.964 JupyterHub batchspawner:314] Spawner querying job: ssh -v loginnode sudo -E -u ABCD "squeue -h -j 41 -o '%T %B'"
Jul 23 15:05:50 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:50.831 JupyterHub batchspawner:314] Spawner querying job: ssh -v loginnode sudo -E -u ABCD "squeue -h -j 41 -o '%T %B'"
Jul 23 15:05:51 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:51.691 JupyterHub batchspawner:314] Spawner querying job: ssh -v loginnode sudo -E -u ABCD "squeue -h -j 41 -o '%T %B'"
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]: [E 2024-07-23 15:05:52.027 JupyterHub user:1002] Unhandled error starting ABCD's server: The Jupyter batch job started but died before launching the single-user server.
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]:     Traceback (most recent call last):
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]:       File "/opt/jupyterhub/lib/python3.10/site-packages/jupyterhub/user.py", line 916, in spawn
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]:         url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]:       File "/opt/jupyterhub/lib/python3.10/site-packages/batchspawner/batchspawner.py", line 456, in start
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]:         raise RuntimeError(
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]:     RuntimeError: The Jupyter batch job started but died before launching the single-user server.
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]:     
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:52.033 JupyterHub user:1095] Stopping ABCD
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:52.051 JupyterHub user:1117] Deleting oauth client jupyterhub-user-ABCD
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]: [D 2024-07-23 15:05:52.062 JupyterHub user:1120] Finished stopping ABCD
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]: [E 2024-07-23 15:05:52.072 JupyterHub gen:629] Exception in Future <Task finished name='Task-8315' coro=<BaseHandler.spawn_single_user.<locals>.finish_user_spawn() done, defined at /opt/jupyterhub/lib/python3.10/site-packages/jupyterhub/handlers/base.py:1115> exception=RuntimeError('The Jupyter batch job started but died before launching the single-user server.')> after timeout
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]:     Traceback (most recent call last):
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]:       File "/opt/jupyterhub/lib/python3.10/site-packages/tornado/gen.py", line 624, in error_callback
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]:         future.result()
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]:       File "/opt/jupyterhub/lib/python3.10/site-packages/jupyterhub/handlers/base.py", line 1122, in finish_user_spawn
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]:         await spawn_future
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]:       File "/opt/jupyterhub/lib/python3.10/site-packages/jupyterhub/user.py", line 1016, in spawn
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]:         raise e
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]:       File "/opt/jupyterhub/lib/python3.10/site-packages/jupyterhub/user.py", line 916, in spawn
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]:         url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]:       File "/opt/jupyterhub/lib/python3.10/site-packages/batchspawner/batchspawner.py", line 456, in start
Jul 23 15:05:52 jupyterhub-container jupyterhub[1616]: [I 2024-07-23 15:05:52.077 JupyterHub log:192] 200 GET /hub/api/users/wk5ng/server/progress?_xsrf=[secret] (ABCD@10.23.71.1) 5107.45ms

The internal IPs for the jupyterhub-container, the login node, and the compute node are:

jupyterhub-container: 10.23.71.64
login node: 10.23.71.124
compute node: 10.23.71.134

They can all ping each other and they should all be able to communicate with each other (since that’s the default way LXC setups the internal network).

Thank you so much for your help @mahendrapaipuri.

Thanks for the config and logs. Could you try setting up c.JupyterHub.hub_ip = '10.23.71.64' as well? By default it is set to localhost and that is why the single user containers are not able to reach JupyterHub container.

c.SlurmSpawner.exec_prefix = “ssh -v loginnode sudo -E -u {username}”

Regarding SSHing, why dont you make the container running JupyterHub as a SLURM client so that you can submit job using sbatch directly from this container instead of making a SSH into another container?

1 Like

I set the c.JupyterHub.hub_ip = '10.23.71.64' and I’m still getting the same errors. I’ll try looking at the config file defaults and seeing what else I can change, but if you have any suggestions, I’m open to them!

@mahendrapaipuri how do I make the container running JupyterHub a Slurm client? I think this might be a promising path to go down, because JupyterHub works perfectly when I have it running in the same container as the login node (at least when I was testing on another version of my test cluster), so this might resolve lots of problems.

/var/log/syslog on JupyterHub container:

Jul 24 16:09:16 jupyterhub-container systemd[1]: Started JupyterHub.
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.068 JupyterHub application:908] Looking for /etc/jupyterhub/jupyterhub_config in /
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.107 JupyterHub application:929] Loaded config file: /etc/jupyterhub/jupyterhub_config.py
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:17.108 JupyterHub app:3286] Running JupyterHub version 5.0.0
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:17.108 JupyterHub app:3316] Using Authenticator: jupyterhub.auth.PAMAuthenticator-5.0.0
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:17.108 JupyterHub app:3316] Using Spawner: builtins.SlurmSpawner
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:17.108 JupyterHub app:3316] Using Proxy: jupyterhub.proxy.ConfigurableHTTPProxy-5.0.0
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.111 JupyterHub app:3246] Could not load pycurl: No module named 'pycurl'
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]:     pycurl is recommended if you have a large number of users.
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:17.111 JupyterHub app:1817] Loading cookie_secret from /jupyterhub_cookie_secret
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.111 JupyterHub app:1984] Connecting to db: sqlite:///jupyterhub.sqlite
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.127 JupyterHub orm:1510] database schema version found: 4621fec11365
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:17.179 JupyterHub proxy:556] Generating new CONFIGPROXY_AUTH_TOKEN
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.180 JupyterHub app:2291] Loading roles into database
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.197 JupyterHub app:2638] Purging expired APITokens
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.199 JupyterHub app:2638] Purging expired OAuthCodes
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.201 JupyterHub app:2638] Purging expired Shares
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.202 JupyterHub app:2638] Purging expired ShareCodes
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.204 JupyterHub app:2412] Loading role assignments from config
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.219 JupyterHub app:2923] Initializing spawners
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.226 JupyterHub app:3061] Loaded users:
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]:     
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:17.226 JupyterHub app:3355] Initialized 0 spawners in 0.008 seconds
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:17.230 JupyterHub metrics:371] Found 0 active users in the last ActiveUserPeriods.twenty_four_hours
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:17.231 JupyterHub metrics:371] Found 1 active users in the last ActiveUserPeriods.seven_days
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:17.232 JupyterHub metrics:371] Found 1 active users in the last ActiveUserPeriods.thirty_days
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [W 2024-07-24 16:09:17.233 JupyterHub proxy:748] Running JupyterHub without SSL.  I hope there is SSL termination happening somewhere else...
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:17.233 JupyterHub proxy:752] Starting proxy @ http://10.23.71.64:443/
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.233 JupyterHub proxy:753] Proxy cmd: ['configurable-http-proxy', '--ip', '10.23.71.64', '--port', '443', '--api-ip', '127.0.0.1', '--api-port', '8001', '--error-target', 'http://10.23.71.64:5432/hub/error', '--log-level', 'info']
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.234 JupyterHub proxy:670] Writing proxy pid file: jupyterhub-proxy.pid
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.234 JupyterHub utils:272] Waiting 10s for server at 10.23.71.64:443
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.234 JupyterHub utils:119] Server at 10.23.71.64:443 not ready: [Errno 111] Connection refused
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.235 JupyterHub utils:272] Waiting 10s for server at 127.0.0.1:8001
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.235 JupyterHub utils:119] Server at 127.0.0.1:8001 not ready: [Errno 111] Connection refused
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.279 JupyterHub utils:119] Server at 127.0.0.1:8001 not ready: [Errno 111] Connection refused
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.293 JupyterHub utils:119] Server at 10.23.71.64:443 not ready: [Errno 111] Connection refused
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.380 JupyterHub utils:119] Server at 127.0.0.1:8001 not ready: [Errno 111] Connection refused
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.409 JupyterHub utils:119] Server at 10.23.71.64:443 not ready: [Errno 111] Connection refused
Jul 24 16:09:17 jupyterhub-container jupyterhub[2335]: 16:09:17.431 [ConfigProxy] #033[32minfo#033[39m: Proxying http://10.23.71.64:443 to (no default)
Jul 24 16:09:17 jupyterhub-container jupyterhub[2335]: 16:09:17.434 [ConfigProxy] #033[32minfo#033[39m: Proxy API at http://127.0.0.1:8001/api/routes
Jul 24 16:09:17 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:17.951 JupyterHub utils:280] Server at 10.23.71.64:443 responded in 0.72s
Jul 24 16:09:18 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:18.079 JupyterHub utils:280] Server at 127.0.0.1:8001 responded in 0.84s
Jul 24 16:09:18 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:18.080 JupyterHub proxy:832] Proxy started and appears to be up
Jul 24 16:09:18 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:18.084 JupyterHub proxy:925] Proxy: Fetching GET http://127.0.0.1:8001/api/routes
Jul 24 16:09:18 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:18.100 JupyterHub app:3669] Hub API listening on http://10.23.71.64:5432/hub/
Jul 24 16:09:18 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:18.101 JupyterHub proxy:389] Fetching routes to check
Jul 24 16:09:18 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:18.101 JupyterHub proxy:925] Proxy: Fetching GET http://127.0.0.1:8001/api/routes
Jul 24 16:09:18 jupyterhub-container jupyterhub[2335]: 16:09:18.105 [ConfigProxy] #033[32minfo#033[39m: 200 GET /api/routes
Jul 24 16:09:18 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:18.109 JupyterHub proxy:392] Checking routes
Jul 24 16:09:18 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:18.109 JupyterHub proxy:477] Adding route for Hub: / => http://10.23.71.64:5432
Jul 24 16:09:18 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:18.110 JupyterHub proxy:925] Proxy: Fetching POST http://127.0.0.1:8001/api/routes/
Jul 24 16:09:18 jupyterhub-container jupyterhub[2335]: 16:09:18.112 [ConfigProxy] #033[32minfo#033[39m: 200 GET /api/routes
Jul 24 16:09:18 jupyterhub-container jupyterhub[2335]: 16:09:18.116 [ConfigProxy] #033[32minfo#033[39m: Adding route / -> http://10.23.71.64:5432
Jul 24 16:09:18 jupyterhub-container jupyterhub[2335]: 16:09:18.117 [ConfigProxy] #033[32minfo#033[39m: Route added / -> http://10.23.71.64:5432
Jul 24 16:09:18 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:18.118 JupyterHub app:3710] JupyterHub is now running at http://10.23.71.64:443/
Jul 24 16:09:18 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:18.119 JupyterHub app:3279] It took 1.054 seconds for the Hub to start
Jul 24 16:09:18 jupyterhub-container jupyterhub[2335]: 16:09:18.120 [ConfigProxy] #033[32minfo#033[39m: 201 POST /api/routes/
Jul 24 16:09:37 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:37.421 JupyterHub log:192] 302 GET / -> /hub/ (@10.23.71.1) 7.56ms
Jul 24 16:09:37 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:37.476 JupyterHub base:411] Refreshing auth for wk5ng
Jul 24 16:09:37 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:37.477 JupyterHub user:496] Creating <class 'SlurmSpawner'> for wk5ng:
Jul 24 16:09:37 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:37.479 JupyterHub log:192] 302 GET /hub/ -> /hub/spawn (wk5ng@10.23.71.1) 49.23ms
Jul 24 16:09:37 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:37.527 JupyterHub scopes:1010] Checking access to /hub/spawn via scope servers!server=wk5ng/
Jul 24 16:09:37 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:37.529 JupyterHub pages:208] Serving options form for wk5ng
Jul 24 16:09:37 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:37.568 JupyterHub log:192] 200 GET /hub/spawn (wk5ng@10.23.71.1) 51.18ms
Jul 24 16:09:37 jupyterhub-container jupyterhub[2335]: 16:09:37.640 [ConfigProxy] #033[31merror#033[39m: 503 GET /hub/static/favicon.ico socket hang up
Jul 24 16:09:37 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:37.660 JupyterHub log:192] 200 GET /hub/static/favicon.ico?v=fde5757cd3892b979919d3b1faa88a410f28829feb5ba22b6cf069f2c6c98675fceef90f932e49b510e74d65c681d5846b943e7f7cc1b41867422f0481085c1f (@10.23.71.1) 17.66ms
Jul 24 16:09:37 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:37.664 JupyterHub _xsrf_utils:125] Setting new xsrf cookie for b'None:ilOX9p1Ki5JhIVeDgy5MB4Pbs5lQZv4v6KVF31-ed0w=' {'path': '/hub/', 'max_age': 3600}
Jul 24 16:09:37 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:37.664 JupyterHub pages:660] Using default error template for 503
Jul 24 16:09:37 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:37.674 JupyterHub log:192] 200 GET /hub/error/503?url=%2Fhub%2Fstatic%2Ffavicon.ico%3Fv%3Dfde5757cd3892b979919d3b1faa88a410f28829feb5ba22b6cf069f2c6c98675fceef90f932e49b510e74d65c681d5846b943e7f7cc1b41867422f0481085c1f (@10.23.71.64) 12.18ms
Jul 24 16:09:37 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:37.676 JupyterHub log:192] 304 GET /hub/static/components/@fortawesome/fontawesome-free/webfonts/fa-solid-900.woff2 (@10.23.71.1) 13.34ms
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:39.474 JupyterHub scopes:1010] Checking access to /hub/spawn via scope servers!server=wk5ng/
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:39.475 JupyterHub pages:256] Triggering spawn with supplied form options for wk5ng
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:39.477 JupyterHub base:1095] Initiating spawn for wk5ng
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:39.477 JupyterHub base:1099] 0/100 concurrent spawns
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:39.477 JupyterHub base:1104] 0 active servers
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:39.546 JupyterHub provider:661] Creating oauth client jupyterhub-user-wk5ng
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:39.608 JupyterHub user:912] Calling Spawner.start for wk5ng
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:39.610 JupyterHub batchspawner:281] Spawner script options: {'account': '', 'cluster': '', 'epilogue': '', 'gres': '', 'homedir': '/home/wk5ng', 'host': '', 'keepvars': 'PATH,LANG,JUPYTERHUB_API_TOKEN,JPY_API_TOKEN,JUPYTERHUB_CLIENT_ID,JUPYTERHUB_COOKIE_OPTIONS,JUPYTERHUB_COOKIE_HOST_PREFIX_ENABLED,JUPYTERHUB_HOST,JUPYTERHUB_OAUTH_CALLBACK_URL,JUPYTERHUB_OAUTH_SCOPES,JUPYTERHUB_OAUTH_ACCESS_SCOPES,JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES,JUPYTERHUB_USER,JUPYTERHUB_SERVER_NAME,JUPYTERHUB_API_URL,JUPYTERHUB_ACTIVITY_URL,JUPYTERHUB_BASE_URL,JUPYTERHUB_SERVICE_PREFIX,JUPYTERHUB_SERVICE_URL,JUPYTERHUB_PUBLIC_URL,JUPYTERHUB_PUBLIC_HUB_URL,USER,HOME,SHELL', 'keepvars_extra': '', 'memory': '1', 'ngpus': '', 'nprocs': '', 'options': '', 'partition': '', 'prologue': '', 'qos': '', 'queue': '', 'reservation': '', 'runtime': '1', 'srun': 'srun', 'username': 'wk5ng', 'cmd': 'batchspawner-singleuser jupyterhub-singleuser', 'cpu_cores': '1'}
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:39.611 JupyterHub batchspawner:282] Spawner submitting command: ssh -v loginnode sudo -E -u wk5ng sbatch --parsable
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:39.611 JupyterHub batchspawner:283] Spawner submitting script:
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]:     #!/bin/bash
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]:     #SBATCH --partition=debug
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]:     #SBATCH --cpus-per-task=1
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]:     #SBATCH --mem=1G
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]:     #SBATCH -t 0-1:00
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]:     #SBATCH --output=/home/wk5ng/jupyterhub_slurmspawner_%j.log
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]:     #SBATCH --job-name=spawner-jupyterhub
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]:     #SBATCH --chdir=/home/wk5ng
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]:     #SBATCH --export=PATH,LANG,JUPYTERHUB_API_TOKEN,JPY_API_TOKEN,JUPYTERHUB_CLIENT_ID,JUPYTERHUB_COOKIE_OPTIONS,JUPYTERHUB_COOKIE_HOST_PREFIX_ENABLED,JUPYTERHUB_HOST,JUPYTERHUB_OAUTH_CALLBACK_URL,JUPYTERHUB_OAUTH_SCOPES,JUPYTERHUB_OAUTH_ACCESS_SCOPES,JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES,JUPYTERHUB_USER,JUPYTERHUB_SERVER_NAME,JUPYTERHUB_API_URL,JUPYTERHUB_ACTIVITY_URL,JUPYTERHUB_BASE_URL,JUPYTERHUB_SERVICE_PREFIX,JUPYTERHUB_SERVICE_URL,JUPYTERHUB_PUBLIC_URL,JUPYTERHUB_PUBLIC_HUB_URL,USER,HOME,SHELL
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]:     #SBATCH --get-user-env=L
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]:     #SBATCH
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:39.614 JupyterHub batchspawner:284] Spawner submitting environment: {'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin', 'LANG': 'C.UTF-8', 'JUPYTERHUB_API_TOKEN': '510592dd1378429592490893c70dbd88', 'JPY_API_TOKEN': '510592dd1378429592490893c70dbd88', 'JUPYTERHUB_CLIENT_ID': 'jupyterhub-user-wk5ng', 'JUPYTERHUB_COOKIE_OPTIONS': '{"expires_days": 3}', 'JUPYTERHUB_COOKIE_HOST_PREFIX_ENABLED': '0', 'JUPYTERHUB_HOST': '', 'JUPYTERHUB_OAUTH_CALLBACK_URL': '/user/wk5ng/oauth_callback', 'JUPYTERHUB_OAUTH_SCOPES': '["access:servers!server=wk5ng/", "access:servers!user=wk5ng"]', 'JUPYTERHUB_OAUTH_ACCESS_SCOPES': '["access:servers!server=wk5ng/", "access:servers!user=wk5ng"]', 'JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES': '[]', 'JUPYTERHUB_USER': 'wk5ng', 'JUPYTERHUB_SERVER_NAME': '', 'JUPYTERHUB_API_URL': 'http://10.23.71.64:5432/hub/api', 'JUPYTERHUB_ACTIVITY_URL': 'http://10.23.71.64:5432/hub/api/users/wk5ng/activity', 'JUPYTERHUB_BASE_URL': '/', 'JUPYTERHUB_SERVICE_PREFIX': '/user/wk5ng/', 'JUPYTERHUB_SERVICE_URL': 'http://0.0.0.0:0/user/wk5ng/', 'JUPYTERHUB_PUBLIC_URL': '', 'JUPYTERHUB_PUBLIC_HUB_URL': '', 'USER': 'wk5ng', 'HOME': '/home/wk5ng', 'SHELL': '/bin/bash'}
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]: [W 2024-07-24 16:09:39.619 JupyterHub base:201] Rolling back dirty objects IdentitySet([<Server(:0)>])
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:39.645 JupyterHub log:192] 302 POST /hub/spawn?_xsrf=[secret] -> /hub/spawn-pending/wk5ng?_xsrf=[secret] (wk5ng@10.23.71.1) 157.21ms
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:39.654 JupyterHub scopes:1010] Checking access to /hub/spawn-pending/wk5ng via scope servers!server=wk5ng/
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:39.664 JupyterHub pages:397] wk5ng is pending spawn
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:39.693 JupyterHub log:192] 200 GET /hub/spawn-pending/wk5ng?_xsrf=[secret] (wk5ng@10.23.71.1) 46.13ms
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]: [W 2024-07-24 16:09:39.793 JupyterHub _xsrf_utils:195] Skipping XSRF check for insecure request GET /hub/api/users/wk5ng/server/progress
Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:39.794 JupyterHub scopes:1010] Checking access to /hub/api/users/wk5ng/server/progress via scope read:servers!server=wk5ng/
Jul 24 16:09:40 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:40.416 JupyterHub batchspawner:287] Job submitted. output: 42
Jul 24 16:09:40 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:40.421 JupyterHub batchspawner:314] Spawner querying job: ssh -v loginnode sudo -E -u wk5ng "squeue -h -j 42 -o '%T %B'"
Jul 24 16:09:41 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:41.443 JupyterHub batchspawner:314] Spawner querying job: ssh -v loginnode sudo -E -u wk5ng "squeue -h -j 42 -o '%T %B'"
Jul 24 16:09:42 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:42.230 JupyterHub batchspawner:314] Spawner querying job: ssh -v loginnode sudo -E -u wk5ng "squeue -h -j 42 -o '%T %B'"
Jul 24 16:09:43 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:43.100 JupyterHub batchspawner:314] Spawner querying job: ssh -v loginnode sudo -E -u wk5ng "squeue -h -j 42 -o '%T %B'"
Jul 24 16:09:43 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:43.975 JupyterHub batchspawner:314] Spawner querying job: ssh -v loginnode sudo -E -u wk5ng "squeue -h -j 42 -o '%T %B'"
Jul 24 16:09:44 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:44.850 JupyterHub batchspawner:314] Spawner querying job: ssh -v loginnode sudo -E -u wk5ng "squeue -h -j 42 -o '%T %B'"
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]: [E 2024-07-24 16:09:45.134 JupyterHub user:1002] Unhandled error starting wk5ng's server: The Jupyter batch job started but died before launching the single-user server.
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]:     Traceback (most recent call last):
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]:       File "/opt/jupyterhub/lib/python3.10/site-packages/jupyterhub/user.py", line 916, in spawn
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]:         url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]:       File "/opt/jupyterhub/lib/python3.10/site-packages/batchspawner/batchspawner.py", line 456, in start
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]:         raise RuntimeError(
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]:     RuntimeError: The Jupyter batch job started but died before launching the single-user server.
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]:     
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:45.141 JupyterHub user:1095] Stopping wk5ng
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:45.159 JupyterHub user:1117] Deleting oauth client jupyterhub-user-wk5ng
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:45.176 JupyterHub user:1120] Finished stopping wk5ng
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]: [E 2024-07-24 16:09:45.190 JupyterHub gen:629] Exception in Future <Task finished name='Task-1136' coro=<BaseHandler.spawn_single_user.<locals>.finish_user_spawn() done, defined at /opt/jupyterhub/lib/python3.10/site-packages/jupyterhub/handlers/base.py:1115> exception=RuntimeError('The Jupyter batch job started but died before launching the single-user server.')> after timeout
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]:     Traceback (most recent call last):
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]:       File "/opt/jupyterhub/lib/python3.10/site-packages/tornado/gen.py", line 624, in error_callback
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]:         future.result()
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]:       File "/opt/jupyterhub/lib/python3.10/site-packages/jupyterhub/handlers/base.py", line 1122, in finish_user_spawn
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]:         await spawn_future
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]:       File "/opt/jupyterhub/lib/python3.10/site-packages/jupyterhub/user.py", line 1016, in spawn
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]:         raise e
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]:       File "/opt/jupyterhub/lib/python3.10/site-packages/jupyterhub/user.py", line 916, in spawn
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]:         url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]:       File "/opt/jupyterhub/lib/python3.10/site-packages/batchspawner/batchspawner.py", line 456, in start
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]:         raise RuntimeError(
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]:     RuntimeError: The Jupyter batch job started but died before launching the single-user server.
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]:     
Jul 24 16:09:45 jupyterhub-container jupyterhub[2331]: [I 2024-07-24 16:09:45.196 JupyterHub log:192] 200 GET /hub/api/users/wk5ng/server/progress?_xsrf=[secret] (wk5ng@10.23.71.1) 5418.81ms

Slurm log for the job launched by JupyterHub:

testgpu108
wk5ng
/usr/local/bin/batchspawner-singleuser
batchspawner-singleuser jupyterhub-singleuser
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.11/site-packages/urllib3/connection.py", line 203, in _new_conn
    sock = connection.create_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/urllib3/util/connection.py", line 85, in create_connection
    raise err
  File "/opt/anaconda3/lib/python3.11/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.11/site-packages/urllib3/connectionpool.py", line 791, in urlopen
    response = self._make_request(
               ^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/urllib3/connectionpool.py", line 497, in _make_request
    conn.request(
  File "/opt/anaconda3/lib/python3.11/site-packages/urllib3/connection.py", line 395, in request
    self.endheaders()
  File "/opt/anaconda3/lib/python3.11/http/client.py", line 1289, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/opt/anaconda3/lib/python3.11/http/client.py", line 1048, in _send_output
    self.send(msg)
  File "/opt/anaconda3/lib/python3.11/http/client.py", line 986, in send
    self.connect()
  File "/opt/anaconda3/lib/python3.11/site-packages/urllib3/connection.py", line 243, in connect
    self.sock = self._new_conn()
                ^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/urllib3/connection.py", line 218, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x1479f1fbc990>: Failed to establish a new connection: [Errno 111] Connection refused

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.11/site-packages/requests/adapters.py", line 486, in send
    resp = conn.urlopen(
           ^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/urllib3/connectionpool.py", line 845, in urlopen
    retries = retries.increment(
              ^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/urllib3/util/retry.py", line 515, in increment
    raise MaxRetryError(_pool, url, reason) from reason  # type: ignore[arg-type]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=8081): Max retries exceeded with url: /hub/api/batchspawner (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1479f1fbc990>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/batchspawner-singleuser", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/batchspawner/singleuser.py", line 26, in main
    requests.post(
  File "/opt/anaconda3/lib/python3.11/site-packages/requests/api.py", line 115, in post
    return request("post", url, data=data, json=json, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/requests/adapters.py", line 519, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8081): Max retries exceeded with url: /hub/api/batchspawner (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x1479f1fbc990>: Failed to establish a new connection: [Errno 111] Connection refused'))
jupyterhub-singleuser ended gracefully

JupyterHub’s API URL is fetched from env var JUPYTERHUB_API_URL and from the logs it is clear that the env var is not being found. That is why the single user server is attempting to connect to “default” API URL. I guess that reason that the env var is not being set is due to the SSH connection you are doing before submitting the job.

If you are doing SSH before, you will need to export the following env vars into the remote SSH session so that when you execute sbatch, they will be exported to the job’s environment:

Jul 24 16:09:39 jupyterhub-container jupyterhub[2331]: [D 2024-07-24 16:09:39.614 JupyterHub batchspawner:284] Spawner submitting environment: {'PATH': '/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin', 'LANG': 'C.UTF-8', 'JUPYTERHUB_API_TOKEN': '510592dd1378429592490893c70dbd88', 'JPY_API_TOKEN': '510592dd1378429592490893c70dbd88', 'JUPYTERHUB_CLIENT_ID': 'jupyterhub-user-wk5ng', 'JUPYTERHUB_COOKIE_OPTIONS': '{"expires_days": 3}', 'JUPYTERHUB_COOKIE_HOST_PREFIX_ENABLED': '0', 'JUPYTERHUB_HOST': '', 'JUPYTERHUB_OAUTH_CALLBACK_URL': '/user/wk5ng/oauth_callback', 'JUPYTERHUB_OAUTH_SCOPES': '["access:servers!server=wk5ng/", "access:servers!user=wk5ng"]', 'JUPYTERHUB_OAUTH_ACCESS_SCOPES': '["access:servers!server=wk5ng/", "access:servers!user=wk5ng"]', 'JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES': '[]', 'JUPYTERHUB_USER': 'wk5ng', 'JUPYTERHUB_SERVER_NAME': '', 'JUPYTERHUB_API_URL': 'http://10.23.71.64:5432/hub/api', 'JUPYTERHUB_ACTIVITY_URL': 'http://10.23.71.64:5432/hub/api/users/wk5ng/activity', 'JUPYTERHUB_BASE_URL': '/', 'JUPYTERHUB_SERVICE_PREFIX': '/user/wk5ng/', 'JUPYTERHUB_SERVICE_URL': 'http://0.0.0.0:0/user/wk5ng/', 'JUPYTERHUB_PUBLIC_URL': '', 'JUPYTERHUB_PUBLIC_HUB_URL': '', 'USER': 'wk5ng', 'HOME': '/home/wk5ng', 'SHELL': '/bin/bash'}

how do I make the container running JupyterHub a Slurm client?

I dont know your environment but I would say the same way you are spinning the login node. Instead of SSH server, you install all the components of JupyterHub. Anyways, you add a slurm client to a cluster, you will need to install SLURM utility binaries, SLURM config and munge key. With those installed, you should be able to submit jobs from the container.

I passed the env vars into the remote SSH session by putting them all in c.SlurmSpawner.exec_prefix (as well as configuring /etc/ssh/sshd_config on the login node and /.ssh/config on the Jupyterhub container), but now I’m encountering a different problem (which is good because I guess this is progress!)

Now the Slurm logs show this error which stumps me:

Traceback (most recent call last):
  File "/usr/local/bin/batchspawner-singleuser", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/anaconda3/lib/python3.11/site-packages/batchspawner/singleuser.py", line 47, in main
    run_path(cmd_path, run_name="__main__")
  File "<frozen runpy>", line 281, in run_path
  File "/opt/anaconda3/lib/python3.11/pkgutil.py", line 416, in get_importer
    path_item = os.fsdecode(path_item)
                ^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen os>", line 824, in fsdecode
TypeError: expected str, bytes or os.PathLike object, not NoneType

It’s also possible that I’m sending the environment variables incorrectly through SSH, this is what the exec_prefix command looks like now:

c.SlurmSpawner.exec_prefix = "ssh -v -o SendEnv=PATH,LANG,JPY_API_TOKEN,JUPYTERHUB_API_URL,JUPYTERHUB_API_TOKEN,JUPYTERHUB_CLIENT_ID,JUPYTERHUB_COOKIE_OPTIONS,JUPYTERHUB_COOKIE_HOST_PREFIX_ENABLED,JUPYTERHUB_HOST,JUPYTERHUB_OAUTH_CALLBACK_URL,JUPYTERHUB_OAUTH_SCOPES,JUPYTERHUB_OAUTH_ACCESS_SCOPES,JUPYTERHUB_OAUTH_CLIENT_ALLOWED_SCOPES,JUPYTERHUB_USER,JUPYTERHUB_SERVER_NAME,JUPYTERHUB_ACTIVITY_URL,JUPYTERHUB_BASE_URL,JUPYTERHUB_SERVICE_PREFIX,JUPYTERHUB_SERVICE_URL,JUPYTERHUB_PUBLIC_URL,JUPYTERHUB_PUBLIC_HUB_URL,USER,HOME,SHELL  loginnode sudo -E -u {username}"

I also modified the /.ssh/config on the JupyterHub container to have the line:

SendEnv JUPYTERHUB_* PATH LANG JPY_* USER HOME SHELL

and /etc/ssh/sshd_config on the login node container to have the line:

AcceptEnv LANG LC_* JUPYTERHUB_* PATH JPY_* USER HOME SHELL

but maybe I’m missing something.

I will try setting up a separate container that has both JupyterHub and Slurm running in the same container and see if that would work well (yay for containerization!)

Setting up the container running JupyterHub as a Slurm client works great, no need to SSH to the login node anymore!

Thanks for all your help @mahendrapaipuri

1 Like