TLJH spawn timeout

We are trying to create a golden AMI for launching EC2 instances for our data devs. TLJH is already installed on the image, along with several lab extensions. Since we want to use SSO authentication, an additional config file was created with the necessary code for leveraging our Cognito user pool’s federated logins.

Next, a cloud formation template was created for spinning up a fully loaded and configured TLJH EC2 instance. Part of the template contains a user data setup script that runs post-creation (the port assignment is just for our dev environment redirection to localhost):

#!/bin/bash
systemctl stop jupyterhub.service
sudo tljh-config set http.port 8000
sudo tljh-config add-item users.admin testorg_${Email}
sudo tljh-config set user_environment.default_app jupyterlab
sudo tljh-config reload

When the deployment is complete, I am able to browse to the hub and get prompted to sign in with Cognito, which works fine. When the server attempts to spawn, it will timeout and fail. If I try to restart the user’s server, then the spawn works. Below are the relevant failure lines from the hub log:

Aug 25 14:15:24 ip-10-73-32-231 python3[1411]: [I 2022-08-25 14:15:24.760 JupyterHub base:762] User logged in: testorg_testuser@company.com
Aug 25 14:15:24 ip-10-73-32-231 python3[1411]: [I 2022-08-25 14:15:24.762 JupyterHub log:189] 302 GET /hub/oauth_callback?code=[secret]&state=[secret] -> /hub/ (@127.0.0.1) 480.53ms
Aug 25 14:15:24 ip-10-73-32-231 python3[1411]: [I 2022-08-25 14:15:24.805 JupyterHub log:189] 302 GET /hub/ -> /hub/spawn (testorg_testuser@company.com@127.0.0.1) 12.37ms
Aug 25 14:15:24 ip-10-73-32-231 python3[1411]: [I 2022-08-25 14:15:24.852 JupyterHub provider:574] Creating oauth client jupyterhub-user-testuser%40company.com
Aug 25 14:15:25 ip-10-73-32-231 python3[1411]: Running as unit: jupyter-testorg_testuser@company.com.service
Aug 25 14:15:25 ip-10-73-32-231 python3[1411]: [I 2022-08-25 14:15:25.839 JupyterHub log:189] 302 GET /hub/spawn -> /hub/spawn-pending/testorg_testuser@company.com (testorg_testuser@company.com@127.0.0.1) 1004.88ms
Aug 25 14:15:25 ip-10-73-32-231 python3[1411]: [I 2022-08-25 14:15:25.865 JupyterHub pages:402] testorg_testuser@company.com is pending spawn
Aug 25 14:15:25 ip-10-73-32-231 python3[1411]: [I 2022-08-25 14:15:25.872 JupyterHub log:189] 200 GET /hub/spawn-pending/testorg_testuser@company.com (testorg_testuser@company.com@127.0.0.1) 10.33ms
Aug 25 14:15:34 ip-10-73-32-231 python3[1411]: [W 2022-08-25 14:15:34.854 JupyterHub base:1043] User testorg_testuser@company.com is slow to become responsive (timeout=10)
Aug 25 14:15:40 ip-10-73-32-231 python3[1411]: [I 2022-08-25 14:15:40.801 JupyterHub log:189] 200 GET /hub/api/ (cull-idle@127.0.0.1) 95.08ms
Aug 25 14:15:40 ip-10-73-32-231 python3[1411]: [I 2022-08-25 14:15:40.812 JupyterHub log:189] 200 GET /hub/api/users?state=[secret] (cull-idle@127.0.0.1) 8.03ms
Aug 25 14:15:53 ip-10-73-32-231 python3[1411]: [W 2022-08-25 14:15:53.118 JupyterHub user:772] testorg_testuser@company.com's server never showed up at http://127.0.0.1:41947/user/testorg_testuser@company.com/ after 30 seconds. Giving up
Aug 25 14:15:55 ip-10-73-32-231 python3[1411]: [I 2022-08-25 14:15:55.689 JupyterHub log:189] 200 GET /hub/api (@127.0.0.1) 3.18ms
Aug 25 14:15:56 ip-10-73-32-231 python3[1411]: [I 2022-08-25 14:15:56.083 JupyterHub log:189] 200 POST /hub/api/users/testorg_testuser@company.com/activity (testorg_testuser@company.com@127.0.0.1) 290.22ms
Aug 25 14:16:40 ip-10-73-32-231 python3[1411]: [I 2022-08-25 14:16:40.691 JupyterHub log:189] 200 GET /hub/api/ (cull-idle@127.0.0.1) 8.64ms
Aug 25 14:16:40 ip-10-73-32-231 python3[1411]: [I 2022-08-25 14:16:40.699 JupyterHub log:189] 200 GET /hub/api/users?state=[secret] (cull-idle@127.0.0.1) 4.96ms
Aug 25 14:17:23 ip-10-73-32-231 python3[1411]: [E 2022-08-25 14:17:23.305 JupyterHub gen:624] Exception in Future <Task finished coro=<BaseHandler.spawn_single_user.<locals>.finish_user_spawn() done, defined at /opt/tljh/hub/lib/python3.6/site-packages/jupyterhub/handlers/base.py:900> exception=TimeoutError("Server at http://127.0.0.1:41947/user/testorg_testuser@company.com/ didn't respond in 30 seconds",)> after timeout
Aug 25 14:17:23 ip-10-73-32-231 python3[1411]:     Traceback (most recent call last):
Aug 25 14:17:23 ip-10-73-32-231 python3[1411]:       File "/opt/tljh/hub/lib/python3.6/site-packages/tornado/gen.py", line 618, in error_callback
Aug 25 14:17:23 ip-10-73-32-231 python3[1411]:         future.result()
Aug 25 14:17:23 ip-10-73-32-231 python3[1411]:       File "/opt/tljh/hub/lib/python3.6/site-packages/jupyterhub/handlers/base.py", line 907, in finish_user_spawn
Aug 25 14:17:23 ip-10-73-32-231 python3[1411]:         await spawn_future
Aug 25 14:17:23 ip-10-73-32-231 python3[1411]:       File "/opt/tljh/hub/lib/python3.6/site-packages/jupyterhub/user.py", line 748, in spawn
Aug 25 14:17:23 ip-10-73-32-231 python3[1411]:         await self._wait_up(spawner)
Aug 25 14:17:23 ip-10-73-32-231 python3[1411]:       File "/opt/tljh/hub/lib/python3.6/site-packages/jupyterhub/user.py", line 795, in _wait_up
Aug 25 14:17:23 ip-10-73-32-231 python3[1411]:         raise e
Aug 25 14:17:23 ip-10-73-32-231 python3[1411]:       File "/opt/tljh/hub/lib/python3.6/site-packages/jupyterhub/user.py", line 763, in _wait_up
Aug 25 14:17:23 ip-10-73-32-231 python3[1411]:         http=True, timeout=spawner.http_timeout, ssl_context=ssl_context
Aug 25 14:17:23 ip-10-73-32-231 python3[1411]:       File "/opt/tljh/hub/lib/python3.6/site-packages/jupyterhub/utils.py", line 241, in wait_for_http_server
Aug 25 14:17:23 ip-10-73-32-231 python3[1411]:         timeout=timeout,
Aug 25 14:17:23 ip-10-73-32-231 python3[1411]:       File "/opt/tljh/hub/lib/python3.6/site-packages/jupyterhub/utils.py", line 184, in exponential_backoff
Aug 25 14:17:23 ip-10-73-32-231 python3[1411]:         raise TimeoutError(fail_message)
Aug 25 14:17:23 ip-10-73-32-231 python3[1411]:     TimeoutError: Server at http://127.0.0.1:41947/user/testorg_testuser@company.com/ didn't respond in 30 seconds

I would sincerely appreciate some insight as to what is causing this spawn to fail in the first place.