Jupyterhub checks for notebook servers at wrong port at startup?

I’m upgrading to Jupyterhub 2.1.1. When I restart the hub it does a check for all the servers that were running. But it can’t find them and ends up deleting them. I get logs like:

W 2022-02-17 03:06:39.230 JupyterHub user:812] mark's server never showed up at http://jupyter-mark:53749/jupyterhub_internal/user/mark/ after 120 seconds. Giving up.
Common causes of this timeout, and debugging tips:
1. The server didn't finish starting,
<etc etc>
[E 2022-02-17 03:26:53.008 JupyterHub app:2487] mark does not appear to be running at http://jupyter-mark:53749/jupyterhub_internal/user/mark/, shutting it down.

I’m using swarmspawner and all of my notebooks are in Docker containers with port 8888, but it seems to be polling at high ports (50388, 52751, etc).

Is it due to running jupyterhub-idle-culler at startup? Is there a config there I’m missing? I’m running the proxy and db in separate containers. In 1.4 and 1.5 there was no problem with restarting the hub with running containers.

Thanks,
Mark

Hi! Please could you share enough information for someone to reproduce your problem? For instance, tell us the versions of all components you’re using, and how you built your containers (or if you’re using official ones), whether you have any custom Docker host confiiguration, and your full configuration with secrets redacted.

Can you test with just DockerSpawner instead of swarmspawner? The latter requires more effort to setup, and therefore makes it harder to investigate.

Hi Manics,

Yes, I’ll see if I can setup something minimal locally for testing. At least it’s not the idle-culler, that worked fine after I let it sit for awhile.

Thanks,
Mark

I finally got a dockerspawner minimal config working with just docker-compose, and it works fine. Stopping / removing the hub and then docker-compose up finds existing services. It must either be my particular config or something with swarmspawner. I’ll see if I can get a minimal swarmspawner working.

[D 2022-03-02 21:52:05.419 JupyterHub app:2481] Verifying that mark is running at http://172.18.0.5:8888/user/mark/
[D 2022-03-02 21:52:05.425 JupyterHub utils:230] Server at http://172.18.0.5:8888/user/mark/ responded with 302
[D 2022-03-02 21:52:05.425 JupyterHub _version:74] jupyterhub and jupyterhub-singleuser both on version 2.1.1
[I 2022-03-02 21:52:05.425 JupyterHub app:2495] mark still running
[

Thanks,
Mark

1 Like

Is it possible that these were servers that had begun to start, but not finished starting when the Hub was restarted? That’s the kind of thing that could result in a semi-random port being in the db, and part of why this check is there.

It goes like this:

  1. prepare to start Spawner
  2. placeholder record for Spawner’s server endpoint is added to db
  3. Spawner finishes starting, returns URL
  4. record Spawner’s final server endpoint in db

If the Hub restarts between steps 2 and 3, the URL in the db can be wrong, but the state looks like a running server. That’s why there’s a startup check. If it finds there’s a URL in the db but there’s no server actually running there, it treats it as crashed or failed, and tries to cleanup.

I gave them enough time to be settled. I got excited when I saw this line in swarmspawner:

  async def get_ip_and_port(self):
        """Queries Docker daemon for service's IP and port.
        If you are using network_mode=host, you will need to override
        this method as follows::
            async def get_ip_and_port(self):
                return self.host_ip, self.port

Because I do use mode=host, but that made it worse (the ip was completely wrong and nothing worked). And that routine never gets called when the hub is restarted anyway. I’m going to try to find where it populates the database with the port number. It must not be accessing the correct swarmspawner object I guess.

If Spawn has completed, all you need to do is return the correct ip and port (or full url as a string) from .start().

I have no idea how to get host networking right with swarm!

I added some quick logging in objects.py / wait_up to see which port it was looking for. It looks like wait_up is getting the correct port (8888) when it spawns:

/usr/local/lib/python3.8/dist-packages/jupyterhub/objects.py:180 http://jupyter-mark-2df1b5a91:8888/jupyterhub_internal/user/mark-f1b5a91/

But when I restart the hub, that same routine thinks it’s running at port 41015. Is there something when jupyterhub restarts that changes the port entry?

Maybe I should just finally move to kubernetes, but this installation was working so well. :frowning:

[I 2022-03-04 03:59:40.390 JupyterHub app:2766] Running JupyterHub version 2.1.1
[I 2022-03-04 03:59:40.391 JupyterHub app:2796] Using Authenticator: jhub_sessions.sessions_auth.SessionsAuthenticator
[I 2022-03-04 03:59:40.391 JupyterHub app:2796] Using Spawner: dockerspawner.swarmspawner.SwarmSpawner-12.1.0
[I 2022-03-04 03:59:40.391 JupyterHub app:2796] Using Proxy: jupyterhub.proxy.ConfigurableHTTPProxy-2.1.1
[I 2022-03-04 03:59:40.404 JupyterHub app:1606] Loading cookie_secret from /srv/jupyterhub/jupyterhub_cookie_secret
[I 2022-03-04 03:59:40.592 JupyterHub app:1924] Not using allowed_users. Any authenticated user will be allowed.
[I 2022-03-04 03:59:40.625 JupyterHub roles:444] Added role token to token <APIToken('...', service='jupyterhub-idle-culler-service', client_id='jupyterhub')>


 /usr/local/lib/python3.8/dist-packages/jupyterhub/objects.py:180: UserWarning: http://jupyter-mark-2df1b5a91:41015/jupyterhub_internal/user/mark-f1b5a91/

[W 2022-03-04 03:59:50.690 JupyterHub app:2851] init_spawners did not complete within 10 seconds. Allowing to complete in the background.
[I 2022-03-04 03:59:50.691 JupyterHub app:3048] Not starting proxy
[I 2022-03-04 03:59:50.697 JupyterHub app:3084] Hub API listening on http://0.0.0.0:8081/jupyterhub_internal/hub/
[I 2022-03-04 03:59:50.698 JupyterHub app:3086] Private Hub API connect url http://jupyterhub_hub:8081/jupyterhub_internal/hub/
[I 2022-03-04 03:59:50.698 JupyterHub app:3095] Starting managed service jupyterhub-idle-culler-service
[I 2022-03-04 03:59:50.698 JupyterHub service:377] Starting service 'jupyterhub-idle-culler-service': ['/usr/bin/python3', '-m', 'jupyterhub_idle_culler', '--timeout=7200']
[I 2022-03-04 03:59:50.701 JupyterHub service:132] Spawning /usr/bin/python3 -m jupyterhub_idle_culler --timeout=7200
[I 2022-03-04 03:59:50.713 JupyterHub app:3150] JupyterHub is now running at http://:8000/jupyterhub_internal/
[I 2022-03-04 03:59:51.035 JupyterHub log:189] 200 GET /jupyterhub_internal/hub/api/ (jupyterhub-idle-culler-service@10.0.13.6) 20.37ms
[I 2022-03-04 03:59:51.051 JupyterHub log:189] 200 GET /jupyterhub_internal/hub/api/users?state=[secret] (jupyterhub-idle-culler-service@10.0.13.6) 12.91ms
[W 2022-03-04 04:01:47.659 JupyterHub user:812] mark-f1b5a91's server never showed up at http://jupyter-mark-2df1b5a91:41015/jupyterhub_internal/user/mark-f1b5a91/ after 120 seconds. Giving up.

Maybe the wrong port is being stored in JupyterHub’s database? If you can access the db try running
sqlite3 /path/to/jupyterhub.sqlite 'SELECT * FROM servers'
immediately after starting the server and see what port is stored.

It looks like it’s not updating the port, only the ip:

spawn pending:

jupyterhub=# SELECT * from servers;
 id | proto | ip | port  |                base_url                 | cookie_name
----+-------+----+-------+-----------------------------------------+-------------
  1 | http  |    | 56661 | /jupyterhub_internal/user/mark-f1b5a91/ | cookie
(1 row)

running:

jupyterhub=# SELECT * from servers;
 id | proto |           ip           | port  |                base_url                 | cookie_name
----+-------+------------------------+-------+-----------------------------------------+-------------
  1 | http  | jupyter-mark-2df1b5a91 | 56661 | /jupyterhub_internal/user/mark-f1b5a91/ | cookie
(1 row)

What’s really weird is that it updated the port correctly the first time (on a new worker node), but never again. I’m using a separate pgsql container for the db.

Thanks,
Mark

It seems that everything is getting the message that the port is 8888 (proxy, etc) except for the database. Based on testing I think that the port is getting set correctly for just a brief moment in users.py when the container is spawned, but then it gets reset to its original value almost immediately. I think I’m just going to run periodic script and UPDATE servers SET port = 8888;

Thanks,
Mark