Notebook spawned in Yarn but killed after timeout

Hi, I am trying to set up JupyterHub with YarnSpawner against a Hadoop cluster. I’m seeing a peculiar failure in which the notebook is spawned and successfully begins running, at least one activity message is sent back to the Hub (received and logged), but the Hub still times out waiting for the notebook and seems to eventually kill it, as though it never did receive activity. The Hub’s UI shows a progress bar that stops halfway and then never finishes. Ultimately it goes red when the Hub times out the spawn.

No errors are occurring in the container, it just starts, runs for a little while (sending back an activity message, as noted), then seems to be killed by the Hub. The Hub logs the received message but still times out. I’m not sure what to do from here. I’m including some short log excerpts below, but I’m happy to supply whatever might be useful. First, below is the Hub log. Notice the line:

[I 2023-07-10 15:11:40.001 JupyterHub log:186] 200 POST /hub/api/users/sorokaa/activity (sorokaa@192.168.60.12)

That’s the IP of the node with the application container. I understand that to mean that an activity message was successfully received by the Hub? Here is the Hub log section in full.

23/07/10 15:11:30 INFO skein.Driver: Submitting application...
23/07/10 15:11:30 INFO impl.YarnClientImpl: Submitted application application_1687889389675_0006
[W 2023-07-10 15:11:35.587 JupyterHub base:1072] User sorokaa is slow to start (timeout=10)
[I 2023-07-10 15:11:39.969 JupyterHub log:186] 200 GET /hub/api (@192.168.60.12) 0.95ms
[I 2023-07-10 15:11:40.001 JupyterHub log:186] 200 POST /hub/api/users/sorokaa/activity (sorokaa@192.168.60.12) 21.40ms
15:16:09.880 [ConfigProxy] info: 200 GET /api/routes 
[W 2023-07-10 15:16:25.621 JupyterHub user:825] sorokaa's server failed to start in 300 seconds, giving up.
    
    Common causes of this timeout, and debugging tips:
    
    1. Everything is working, but it took too long.
       To fix: increase `Spawner.start_timeout` configuration
       to a number of seconds that is enough for spawners to finish starting.
    2. The server didn't finish starting,
       or it crashed due to a configuration issue.
       Check the single-user server's logs for hints at what needs fixing.
    
23/07/10 15:16:25 INFO impl.YarnClientImpl: Killed application application_1687889389675_0006
[E 2023-07-10 15:16:25.876 JupyterHub gen:623] Exception in Future <Task finished name='Task-26' coro=<BaseHandler.spawn_single_user.<locals>.finish_user_spawn() done, defined at /cm/shared/datalake/jupyterhub/miniconda/lib/python3.10/site-packages/jupyterhub/handlers/base.py:963> exception=TimeoutError('Timeout')> after timeout
    Traceback (most recent call last):
      File "/cm/shared/datalake/jupyterhub/miniconda/lib/python3.10/site-packages/tornado/gen.py", line 618, in error_callback
        future.result()
      File "/cm/shared/datalake/jupyterhub/miniconda/lib/python3.10/site-packages/jupyterhub/handlers/base.py", line 970, in finish_user_spawn
        await spawn_future
      File "/cm/shared/datalake/jupyterhub/miniconda/lib/python3.10/site-packages/jupyterhub/user.py", line 851, in spawn
        raise e
      File "/cm/shared/datalake/jupyterhub/miniconda/lib/python3.10/site-packages/jupyterhub/user.py", line 748, in spawn
        url = await gen.with_timeout(timedelta(seconds=spawner.start_timeout), f)
    tornado.util.TimeoutError: Timeout
    
[I 2023-07-10 15:16:25.885 JupyterHub log:186] 200 GET /hub/api/users/sorokaa/server/progress (sorokaa@::ffff:10.254.127.252) 299089.40ms
[E 2023-07-10 15:16:25.890 JupyterHub gen:623] Exception in Future <Task finished name='Task-27' coro=<YarnSpawner.start() done, defined at /cm/shared/datalake/jupyterhub/miniconda/lib/python3.10/site-packages/yarnspawner/spawner.py:203> exception=Exception('Application application_1687889389675_0006 failed to start, check application logs for more information')> after timeout
    Traceback (most recent call last):
      File "/cm/shared/datalake/jupyterhub/miniconda/lib/python3.10/site-packages/tornado/gen.py", line 618, in error_callback
        future.result()
      File "/cm/shared/datalake/jupyterhub/miniconda/lib/python3.10/site-packages/yarnspawner/spawner.py", line 247, in start
        raise Exception("Application %s failed to start, check "
    Exception: Application application_1687889389675_0006 failed to start, check application logs for more information

Now here an except from the ResourceManager log, showing the successful launching and running of the container with the notebook in it:

23/07/10 15:11:32 INFO nodemanager.DefaultContainerExecutor: Initializing user sorokaa
23/07/10 15:11:32 INFO nodemanager.DefaultContainerExecutor: Copying from /hadoop/yarn/local/nmPrivate/container_1687889389675_0006_01_00000118941375351.tokens to /hadoop/yarn/local/usercache/sorokaa/appcache/application_1687889389675_0006/container_1687889389675_0006_01_000001.tokens
23/07/10 15:11:32 INFO nodemanager.DefaultContainerExecutor: Localizer CWD set to /hadoop/yarn/local/usercache/sorokaa/appcache/application_1687889389675_0006 = file:/hadoop/yarn/local/usercache/sorokaa/appcache/application_1687889389675_0006
23/07/10 15:11:32 INFO container.ContainerImpl: Container container_1687889389675_0006_01_000001 transitioned from LOCALIZING to SCHEDULED
23/07/10 15:11:32 INFO scheduler.ContainerScheduler: Starting container [container_1687889389675_0006_01_000001]
23/07/10 15:11:33 INFO container.ContainerImpl: Container container_1687889389675_0006_01_000001 transitioned from SCHEDULED to RUNNING
23/07/10 15:11:33 INFO monitor.ContainersMonitorImpl: Starting resource-monitoring for container_1687889389675_0006_01_000001
23/07/10 15:11:33 INFO nodemanager.DefaultContainerExecutor: launchContainer: [bash, /hadoop/yarn/local/usercache/sorokaa/appcache/application_1687889389675_0006/container_1687889389675_0006_01_000001/default_container_executor.sh]
23/07/10 15:11:35 INFO monitor.ContainersMonitorImpl: container_1687889389675_0006_01_000001's ip = 192.168.60.12, and hostname = dl-test-03.cm.cluster

and here it is seemingly getting shut down from the Hub:

23/07/10 15:16:25 INFO ipc.Server: Auth successful for appattempt_1687889389675_0006_000001 (auth:SIMPLE) from 192.168.60.11:48346
23/07/10 15:16:25 INFO containermanager.ContainerManagerImpl: Stopping container with container Id: container_1687889389675_0006_01_000001
23/07/10 15:16:25 INFO container.ContainerImpl: Container container_1687889389675_0006_01_000001 transitioned from RUNNING to KILLING
23/07/10 15:16:25 INFO launcher.ContainerCleanup: Cleaning up container container_1687889389675_0006_01_000001
23/07/10 15:16:25 WARN nodemanager.DefaultContainerExecutor: Exit code from container container_1687889389675_0006_01_000001 is : 143
23/07/10 15:16:25 INFO container.ContainerImpl: Container container_1687889389675_0006_01_000001 transitioned from KILLING to CONTAINER_CLEANEDUP_AFTER_KILL

That auth call which begins the operation is coming from 192.168.60.11, which is the edge node running the Hub.

Lastly, below is a slice of the log from the notebook itself. You’ll see it successfully launch and begin notifying the Hub:

[I 2023-07-10 15:11:39.977 YarnSingleUserNotebookApp mixins:591] Updating Hub with activity every 300 seconds [D 2023-07-10 15:11:39.977 YarnSingleUserNotebookApp mixins:553] Notifying Hub of activity 2023-07-10T19:11:39.915883Z

directly beneath which you’ll see me make a successful HTTP GET against the notebook from inside the cluster, to prove to myself that it really was running and that I am not hallucinating:

[I 2023-07-10 15:13:17.799 YarnSingleUserNotebookApp log:186] 302 GET /user/sorokaa/tree/ -> /user/sorokaa/tree (@192.168.60.11) 0.77ms

Here is the full excerpt from the notebook log:

[I 2023-07-10 15:11:39.881 YarnSingleUserNotebookApp mixins:609] Starting jupyterhub single-user server version 3.1.1
[I 2023-07-10 15:11:39.881 YarnSingleUserNotebookApp mixins:623] Extending __main__.YarnSingleUserNotebookApp from __main__ 
[I 2023-07-10 15:11:39.881 YarnSingleUserNotebookApp mixins:623] Extending jupyter_server.serverapp.ServerApp from jupyter_server 1.23.6
[D 2023-07-10 15:11:39.900 YarnSingleUserNotebookApp application:190] Searching ['/home/.jupyter', '/home/.local/etc/jupyter', '/cm/shared/datalake/jupyterhub/miniconda/etc/jupyter', '/usr/local/etc/jupyter', '/etc/jupyter'] for config files
[D 2023-07-10 15:11:39.900 YarnSingleUserNotebookApp application:902] Looking for jupyter_config in /etc/jupyter
[D 2023-07-10 15:11:39.900 YarnSingleUserNotebookApp application:902] Looking for jupyter_config in /usr/local/etc/jupyter
[D 2023-07-10 15:11:39.900 YarnSingleUserNotebookApp application:902] Looking for jupyter_config in /cm/shared/datalake/jupyterhub/miniconda/etc/jupyter
[D 2023-07-10 15:11:39.900 YarnSingleUserNotebookApp application:902] Looking for jupyter_config in /home/.local/etc/jupyter
[D 2023-07-10 15:11:39.901 YarnSingleUserNotebookApp application:902] Looking for jupyter_config in /home/.jupyter
[D 2023-07-10 15:11:39.901 YarnSingleUserNotebookApp application:902] Looking for jupyter_server_config in /etc/jupyter
[D 2023-07-10 15:11:39.902 YarnSingleUserNotebookApp application:902] Looking for jupyter_server_config in /usr/local/etc/jupyter
[D 2023-07-10 15:11:39.902 YarnSingleUserNotebookApp application:902] Looking for jupyter_server_config in /cm/shared/datalake/jupyterhub/miniconda/etc/jupyter
[D 2023-07-10 15:11:39.902 YarnSingleUserNotebookApp application:902] Looking for jupyter_server_config in /home/.local/etc/jupyter
[D 2023-07-10 15:11:39.902 YarnSingleUserNotebookApp application:902] Looking for jupyter_server_config in /home/.jupyter
[D 2023-07-10 15:11:39.904 YarnSingleUserNotebookApp config_manager:93] Paths used for configuration of jupyter_server_config: 
    	/etc/jupyter/jupyter_server_config.json
[D 2023-07-10 15:11:39.904 YarnSingleUserNotebookApp config_manager:93] Paths used for configuration of jupyter_server_config: 
    	/usr/local/etc/jupyter/jupyter_server_config.json
[D 2023-07-10 15:11:39.904 YarnSingleUserNotebookApp config_manager:93] Paths used for configuration of jupyter_server_config: 
    	/cm/shared/datalake/jupyterhub/miniconda/etc/jupyter/jupyter_server_config.json
[D 2023-07-10 15:11:39.904 YarnSingleUserNotebookApp config_manager:93] Paths used for configuration of jupyter_server_config: 
    	/home/.local/etc/jupyter/jupyter_server_config.json
[D 2023-07-10 15:11:39.905 YarnSingleUserNotebookApp config_manager:93] Paths used for configuration of jupyter_server_config: 
    	/home/.jupyter/jupyter_server_config.json
/cm/shared/datalake/jupyterhub/miniconda/lib/python3.10/site-packages/yarnspawner/singleuser.py:16: RuntimeWarning: coroutine 'HubAuth._api_request' was never awaited
  self.hub_auth._api_request(method='POST',
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
[I 2023-07-10 15:11:39.963 YarnSingleUserNotebookApp mixins:670] Starting jupyterhub-singleuser server version 3.1.1
[D 2023-07-10 15:11:39.969 YarnSingleUserNotebookApp _version:74] jupyterhub and jupyterhub-singleuser both on version 3.1.1
[I 2023-07-10 15:11:39.970 YarnSingleUserNotebookApp serverapp:2686] Serving notebooks from local directory: /hadoop/yarn/local/usercache/sorokaa/appcache/application_1687889389675_0006/container_1687889389675_0006_01_000001
[I 2023-07-10 15:11:39.970 YarnSingleUserNotebookApp serverapp:2686] Jupyter Server 1.23.6 is running at:
[I 2023-07-10 15:11:39.970 YarnSingleUserNotebookApp serverapp:2686] http://dl-test-03:34484/user/sorokaa/tree/
[I 2023-07-10 15:11:39.970 YarnSingleUserNotebookApp serverapp:2686]  or http://127.0.0.1:34484/user/sorokaa/tree/
[I 2023-07-10 15:11:39.970 YarnSingleUserNotebookApp serverapp:2687] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[I 2023-07-10 15:11:39.977 YarnSingleUserNotebookApp mixins:591] Updating Hub with activity every 300 seconds
[D 2023-07-10 15:11:39.977 YarnSingleUserNotebookApp mixins:553] Notifying Hub of activity 2023-07-10T19:11:39.915883Z
[I 2023-07-10 15:13:17.799 YarnSingleUserNotebookApp log:186] 302 GET /user/sorokaa/tree/ -> /user/sorokaa/tree (@192.168.60.11) 0.77ms
[I 2023-07-10 15:13:20.626 YarnSingleUserNotebookApp log:186] 302 GET /user/sorokaa/tree/ -> /user/sorokaa/tree (@192.168.60.11) 0.71ms
[D 2023-07-10 15:13:26.029 YarnSingleUserNotebookApp auth:673] No user identified
[D 2023-07-10 15:13:26.035 YarnSingleUserNotebookApp handlers:273] Using contents: services/contents
[D 2023-07-10 15:13:26.132 YarnSingleUserNotebookApp handlers:881] Path favicon.ico served from /cm/shared/datalake/jupyterhub/miniconda/lib/python3.10/site-packages/jupyter_server/static/favicon.ico
[D 2023-07-10 15:13:26.155 YarnSingleUserNotebookApp handlers:881] Path style/bootstrap.min.css served from /cm/shared/datalake/jupyterhub/miniconda/lib/python3.10/site-packages/jupyter_server/static/style/bootstrap.min.css
[D 2023-07-10 15:13:26.174 YarnSingleUserNotebookApp handlers:881] Path style/bootstrap-theme.min.css served from /cm/shared/datalake/jupyterhub/miniconda/lib/python3.10/site-packages/jupyter_server/static/style/bootstrap-theme.min.css
[D 2023-07-10 15:13:26.185 YarnSingleUserNotebookApp handlers:881] Path style/index.css served from /cm/shared/datalake/jupyterhub/miniconda/lib/python3.10/site-packages/jupyter_server/static/style/index.css
[W 2023-07-10 15:13:26.193 YarnSingleUserNotebookApp log:186] 404 GET /user/sorokaa/tree (@192.168.60.11) 165.33ms
End of LogType:application.driver.log.This log file belongs to a running container (container_1687889389675_0006_01_000001) and so may not be complete.

The only thing that seems obviously strange is the RuntimeWarning:

RuntimeWarning: coroutine 'HubAuth._api_request' was never awaited self.hub_auth._api_request(method='POST', RuntimeWarning: Enable tracemalloc to get the object allocation traceback

but I do not know enough about Jupyter to understand whether this is important or not.

Any help much appreciated. My knowledge of Jupyter is almost nil and this has flummoxed me. Hopefully it is something obvious I just haven’t recognized?

I forgot to include the Hub configuration:

c.JupyterHub.hub_ip = ''

c.JupyterHub.bind_url = 'http://:9090'
c.JupyterHub.cookie_secret_file = '/etc/jupyterhub/jupyterhub_cookie_secret'
c.JupyterHub.db_url = 'sqlite:////var/jupyterhub/jupyterhub.sqlite'
c.JupyterHub.hub_bind_url = 'http://dl-test-02.cm.cluster:9091'

c.JupyterHub.authenticator_class = 'dummyauthenticator.DummyAuthenticator'
# Optionally add a shared global password to be used by all users
c.DummyAuthenticator.password = "test"

c.JupyterHub.spawner_class = 'yarnspawner.YarnSpawner'
#c.YarnSpawner.localize_files = {
#    'environment': {
#        'source': 'hdfs:///jupyterhub/example.tar.gz',
#        'visibility': 'public'
#    }
#}
c.YarnSpawner.prologue = 'source /cm/shared/datalake/jupyterhub/miniconda/bin/activate'
# The memory limit for a notebook instance.
c.YarnSpawner.mem_limit = '2 G'

# The cpu limit for a notebook instance
c.YarnSpawner.cpu_limit = 1

c.YarnSpawner.debug = True
c.JupyterHub.activity_resolution = 30
c.YarnSpawner.poll_interval = 30

Seems like YarnSpawner has not been updated for a while. This error is expected with JupyterHub > 3 as hub_auth._api_request method became a async in JupyterHub 3.

What you can try is to patch the singleuser.py of yarnspawner with following:

import os
import json
import requests

from jupyterhub.singleuser import SingleUserNotebookApp
from jupyterhub.services.auth import HubAuth
from jupyterhub.utils import random_port, url_path_join
from traitlets import default


# Borrowed and modified from jupyterhub/batchspawner:
# https://github.com/jupyterhub/batchspawner/blob/d1052385f245a3c799c5b81d30c8e67f193963c6/batchspawner/singleuser.py
class YarnSingleUserNotebookApp(SingleUserNotebookApp):
    @default('port')
    def _port(self):
        return random_port()

    def start(self):
        hub_auth = HubAuth()
        url = url_path_join(hub_auth.api_url, "yarnspawner")
        headers = {"Authorization": f"token {hub_auth.api_token}"}
        r = requests.post(
               url,
               headers=headers,
               json={"port": self.port},
         )
        super().start()


def main(argv=None):
    # Set configuration directory to something local if not already set
    for var in ['JUPYTER_RUNTIME_DIR', 'JUPYTER_DATA_DIR']:
        if os.environ.get(var) is None:
            if not os.path.exists('.jupyter'):
                os.mkdir('.jupyter')
            os.environ[var] = './.jupyter'
    return YarnSingleUserNotebookApp.launch_instance(argv)


if __name__ == "__main__":
    main()

You can either install this patch as a python package or a standalone script and then configure YarnSpawner.cmd to use this script like c.YarnSpawner.cmd = 'python -m patchedyarnspawner.singleuser'. In principle, this should get rid of the error that you are facing.

Thank so much for this help @mahendrapaipuri ! I will give it a go and report back here. If this technique works, would it be appropriate for me to offer a PR with it for YarnSpawner?

Thank you so much for this advice @mahendrapaipuri ! I will give it a go and report back here.

Sure, if it works, feel free to open a PR.

Hi @mahendrapaipuri , indeed this did get me past that hurdle, thank you, and I will prepare a PR for it.

However, I’m running into something else. My notebook does not load in Hub (/user/sorokaa/tree), it returns a 404. If I look at the single-user notebook log, I see lines like this:

[D 2023-07-11 12:11:25.789 YarnSingleUserNotebookApp auth:673] No user identified
[D 2023-07-11 12:11:25.790 YarnSingleUserNotebookApp handlers:273] Using contents: services/contents
[W 2023-07-11 12:11:25.791 YarnSingleUserNotebookApp log:186] 404 GET /user/sorokaa/tree? (@::ffff:10.254.127.246) 2.65ms

and that No user identified seems very suspicious.

Perhaps the authN isn’t getting passed to the single-user notebook and thus the notebook returns a 404? When I curl the notebook directly (no Hub proxying) I also get a 404, and of course I am not passing any authN when I do that.

Can you point me at the way that Hub is passing the authN to the notebook? I will try to see what is happening on the Yarn side. As far as I can tell, that No user identified is not coming from YarnSpawner but from the Hub code, but perhaps YarnSpawner should be doing something to close the identity loop?

Could you try to clear all the browser cookies and start over clean? This seems to me some non valid cookie issue.

I’m afraid that clearing the browser cookies doesn’t seem to have made any difference. I stopped my server, logged out and shut down the Hub, cleaned out the browser, relaunched the Hub, opened an “incognito” window and logged in…

and got to the same place. :slight_smile:

I’m including the single-user notebook log below, happy to supply anything else that might be interesting or useful!

[I 2023-07-11 14:21:58.981 YarnSingleUserNotebookApp mixins:609] Starting jupyterhub single-user server version 3.1.1
[I 2023-07-11 14:21:58.981 YarnSingleUserNotebookApp mixins:623] Extending __main__.YarnSingleUserNotebookApp from __main__ 
[I 2023-07-11 14:21:58.981 YarnSingleUserNotebookApp mixins:623] Extending jupyter_server.serverapp.ServerApp from jupyter_server 1.23.6
[D 2023-07-11 14:21:58.999 YarnSingleUserNotebookApp application:190] Searching ['/home/.jupyter', '/home/.local/etc/jupyter', '/cm/shared/datalake/jupyterhub/miniconda/etc/jupyter', '/usr/local/etc/jupyter', '/etc/jupyter'] for config files
[D 2023-07-11 14:21:58.999 YarnSingleUserNotebookApp application:902] Looking for jupyter_config in /etc/jupyter
[D 2023-07-11 14:21:58.999 YarnSingleUserNotebookApp application:902] Looking for jupyter_config in /usr/local/etc/jupyter
[D 2023-07-11 14:21:59.000 YarnSingleUserNotebookApp application:902] Looking for jupyter_config in /cm/shared/datalake/jupyterhub/miniconda/etc/jupyter
[D 2023-07-11 14:21:59.000 YarnSingleUserNotebookApp application:902] Looking for jupyter_config in /home/.local/etc/jupyter
[D 2023-07-11 14:21:59.000 YarnSingleUserNotebookApp application:902] Looking for jupyter_config in /home/.jupyter
[D 2023-07-11 14:21:59.000 YarnSingleUserNotebookApp application:902] Looking for jupyter_server_config in /etc/jupyter
[D 2023-07-11 14:21:59.001 YarnSingleUserNotebookApp application:902] Looking for jupyter_server_config in /usr/local/etc/jupyter
[D 2023-07-11 14:21:59.001 YarnSingleUserNotebookApp application:902] Looking for jupyter_server_config in /cm/shared/datalake/jupyterhub/miniconda/etc/jupyter
[D 2023-07-11 14:21:59.001 YarnSingleUserNotebookApp application:902] Looking for jupyter_server_config in /home/.local/etc/jupyter
[D 2023-07-11 14:21:59.001 YarnSingleUserNotebookApp application:902] Looking for jupyter_server_config in /home/.jupyter
[D 2023-07-11 14:21:59.003 YarnSingleUserNotebookApp config_manager:93] Paths used for configuration of jupyter_server_config: 
    	/etc/jupyter/jupyter_server_config.json
[D 2023-07-11 14:21:59.003 YarnSingleUserNotebookApp config_manager:93] Paths used for configuration of jupyter_server_config: 
    	/usr/local/etc/jupyter/jupyter_server_config.json
[D 2023-07-11 14:21:59.003 YarnSingleUserNotebookApp config_manager:93] Paths used for configuration of jupyter_server_config: 
    	/cm/shared/datalake/jupyterhub/miniconda/etc/jupyter/jupyter_server_config.json
[D 2023-07-11 14:21:59.003 YarnSingleUserNotebookApp config_manager:93] Paths used for configuration of jupyter_server_config: 
    	/home/.local/etc/jupyter/jupyter_server_config.json
[D 2023-07-11 14:21:59.004 YarnSingleUserNotebookApp config_manager:93] Paths used for configuration of jupyter_server_config: 
    	/home/.jupyter/jupyter_server_config.json
[I 2023-07-11 14:21:59.080 YarnSingleUserNotebookApp mixins:670] Starting jupyterhub-singleuser server version 3.1.1
[D 2023-07-11 14:21:59.085 YarnSingleUserNotebookApp _version:74] jupyterhub and jupyterhub-singleuser both on version 3.1.1
[I 2023-07-11 14:21:59.086 YarnSingleUserNotebookApp serverapp:2686] Serving notebooks from local directory: /hadoop/yarn/local/usercache/sorokaa/appcache/application_1687889389675_0015/container_1687889389675_0015_01_000001
[I 2023-07-11 14:21:59.086 YarnSingleUserNotebookApp serverapp:2686] Jupyter Server 1.23.6 is running at:
[I 2023-07-11 14:21:59.086 YarnSingleUserNotebookApp serverapp:2686] http://dl-test-01:39199/user/sorokaa/tree/
[I 2023-07-11 14:21:59.086 YarnSingleUserNotebookApp serverapp:2686]  or http://127.0.0.1:39199/user/sorokaa/tree/
[I 2023-07-11 14:21:59.086 YarnSingleUserNotebookApp serverapp:2687] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[I 2023-07-11 14:21:59.092 YarnSingleUserNotebookApp mixins:591] Updating Hub with activity every 300 seconds
[D 2023-07-11 14:21:59.093 YarnSingleUserNotebookApp mixins:553] Notifying Hub of activity 2023-07-11T18:21:59.014371Z
[I 2023-07-11 14:21:59.240 YarnSingleUserNotebookApp log:186] 302 GET /user/sorokaa/ -> /user/sorokaa/tree/? (@192.168.60.11) 0.64ms
[I 2023-07-11 14:21:59.351 YarnSingleUserNotebookApp log:186] 302 GET /user/sorokaa/ -> /user/sorokaa/tree/? (@::ffff:10.254.127.246) 0.59ms
[I 2023-07-11 14:21:59.399 YarnSingleUserNotebookApp log:186] 302 GET /user/sorokaa/tree/? -> /user/sorokaa/tree? (@::ffff:10.254.127.246) 0.55ms
[D 2023-07-11 14:21:59.440 YarnSingleUserNotebookApp auth:673] No user identified
[D 2023-07-11 14:21:59.441 YarnSingleUserNotebookApp handlers:273] Using contents: services/contents
[D 2023-07-11 14:21:59.476 YarnSingleUserNotebookApp handlers:881] Path favicon.ico served from /cm/shared/datalake/jupyterhub/miniconda/lib/python3.10/site-packages/jupyter_server/static/favicon.ico
[D 2023-07-11 14:21:59.478 YarnSingleUserNotebookApp handlers:881] Path style/bootstrap.min.css served from /cm/shared/datalake/jupyterhub/miniconda/lib/python3.10/site-packages/jupyter_server/static/style/bootstrap.min.css
[D 2023-07-11 14:21:59.479 YarnSingleUserNotebookApp handlers:881] Path style/bootstrap-theme.min.css served from /cm/shared/datalake/jupyterhub/miniconda/lib/python3.10/site-packages/jupyter_server/static/style/bootstrap-theme.min.css
[D 2023-07-11 14:21:59.480 YarnSingleUserNotebookApp handlers:881] Path style/index.css served from /cm/shared/datalake/jupyterhub/miniconda/lib/python3.10/site-packages/jupyter_server/static/style/index.css
[W 2023-07-11 14:21:59.481 YarnSingleUserNotebookApp log:186] 404 GET /user/sorokaa/tree? (@::ffff:10.254.127.246) 42.27ms
[D 2023-07-11 14:21:59.538 YarnSingleUserNotebookApp auth:673] No user identified
[D 2023-07-11 14:21:59.540 YarnSingleUserNotebookApp log:186] 200 GET /user/sorokaa/static/style/bootstrap.min.css?v=0e8a7fbd6de23ad6b27ab95802a0a0915af6693af612bc304d83af445529ce5d95842309ca3405d10f538d45c8a3a261b8cff78b4bd512dd9effb4109a71d0ab (@::ffff:10.254.127.246) 2.96ms
[D 2023-07-11 14:21:59.546 YarnSingleUserNotebookApp auth:673] No user identified
[D 2023-07-11 14:21:59.547 YarnSingleUserNotebookApp auth:673] No user identified
[D 2023-07-11 14:21:59.549 YarnSingleUserNotebookApp log:186] 200 GET /user/sorokaa/static/style/bootstrap-theme.min.css?v=8b2f045cb5b4d5ad346f6e816aa2566829a4f5f2783ec31d80d46a57de8ac0c3d21fe6e53bcd8e1f38ac17fcd06d12088bc9b43e23b5d1da52d10c6b717b22b3 (@::ffff:10.254.127.246) 2.86ms
[D 2023-07-11 14:21:59.550 YarnSingleUserNotebookApp log:186] 200 GET /user/sorokaa/static/style/index.css?v=30372e3246a801d662cf9e3f9dd656fa192eebde9054a2282449fe43919de9f0ee9b745d7eb49d3b0a5e56357912cc7d776390eddcab9dac85b77bdb17b4bdae (@::ffff:10.254.127.246) 2.82ms
[D 2023-07-11 14:21:59.769 YarnSingleUserNotebookApp auth:673] No user identified
[D 2023-07-11 14:21:59.771 YarnSingleUserNotebookApp log:186] 200 GET /user/sorokaa/static/favicon.ico?v=50afa725b5de8b00030139d09b38620224d4e7dba47c07ef0e86d4643f30c9bfe6bb7e1a4a1c561aa32834480909a4b6fe7cd1e17f7159330b6b5914bf45a880 (@::ffff:10.254.127.246) 1.85ms
End of LogType:application.driver.log.This log file belongs to a running container (container_1687889389675_0015_01_000001) and so may not be complete.

I m afraid that I am not quite sure where the issue might be coming from. If it helps you, auth is failing to retrieve user_model. From the snippet, I understand that it tries to retrieve user_model from session id and token and if it fails, it tries to get from cookie. In your case, none of them returning a user_model. Maybe that can give you some idea on where to look at!

Thank you @mahendrapaipuri that does give me a place to start! I will insert some logging and if needed trace the traffic between the Hub and notebook and see where that user info is getting lost. I will delay sending a PR for the first issue until I can fix this as well and then I can send a PR for both.

Hello @mahendrapaipuri , thank you for all your help so far! I have spent some time inserting debugging statements and playing with browsers, and I have discovered that the reason no user is being found is because this cookie which contains the user_model is never being set. I can successfully log in (creating this other cookie) but the user-specific cookie simply does not get set at all. (Multiple browsers, many tries.)

Are you aware of any potential cause for this? I’m really surprised! I should mention that I am currently running the Hub over HTTP, for a simple test. It doesn’t seem relevant, because to my limited understanding HTTP/HTTPS should only matter upon retrieving the cookie, not setting it, and it’s not getting set at all, but I thought I should mention it.

Just a further update that I went ahead and set up HTTPS, just in case that made some difference, but it did not. The relevant cookie still is not being set, neither at login nor afterwards…

Unfortunately, no. I dont any have experience with yarn or hadoop neither. I hope JupyterHub core developers can shed some light.

For any core developers who may see this, one piece of information that would really help me would be where/when in the code this cookie should be set.

I have been tracing some more, and as far as I can tell, this handler is where the relevant cookie is to be set. My tracing shows that the single method in that handler is not being executed.

What I do not know is how to apply that handler to the single-user notebook server created by YarnSpawner, or even if that is quite what is necessary. It seems that the SingleUserNotebookApp from which YarnSingleUserNotebookApp inherits should be taking care of such things, but maybe not? I’m well out of my Python/Tornado depth here, so I’m not quite sure what to do next.

Could you share the hub side logs when you are spawning a single user server? That might give some leads to where auth is failing.

You bet, here’s an example below. I can tell you that the Skein/Hadoop-related stuff all looks quite reasonable and from the Hadoop side, the operation is fine. At the last line:

[I 2023-07-14 11:12:58.505 JupyterHub log:186] 302 GET /hub/spawn-pending/lonnemanm -> /user/lonnemanm/ (lonnemanm@::ffff:10.254.127.48) 4.32ms

it does seem that the Hub correctly understands lonnemanm to be logged in. That’s what gives me the understanding that it’s a question of the (apparently older) code in YarnSpawner not correctly engaging the HubOAuthCallbackHandler, which to my limited understanding is responsible single-server-side for setting the cookie that does not get set. But I welcome correction!

Here’s the larger excerpt, starting from login. I’m happy to provide any other logs or config that might be useful.


[D 2023-07-14 11:12:49.693 JupyterHub roles:281] Assigning default role to User lonnemanm
[D 2023-07-14 11:12:49.704 JupyterHub base:568] Setting cookie jupyterhub-session-id: {'httponly': True, 'secure': True, 'path': '/'}
[D 2023-07-14 11:12:49.704 JupyterHub base:572] Setting cookie for lonnemanm: jupyterhub-hub-login
[D 2023-07-14 11:12:49.705 JupyterHub base:568] Setting cookie jupyterhub-hub-login: {'httponly': True, 'secure': True, 'path': '/hub/'}
[I 2023-07-14 11:12:49.705 JupyterHub base:819] User logged in: lonnemanm
[I 2023-07-14 11:12:49.706 JupyterHub log:186] 302 POST /hub/login?next=%2Fhub%2F -> /hub/ (lonnemanm@::ffff:10.254.127.48) 18.78ms
[D 2023-07-14 11:12:49.741 JupyterHub user:431] Creating <class 'yarnspawner.spawner.YarnSpawner'> for lonnemanm:
[I 2023-07-14 11:12:49.743 JupyterHub log:186] 302 GET /hub/ -> /hub/spawn (lonnemanm@::ffff:10.254.127.48) 11.72ms
[D 2023-07-14 11:12:49.838 JupyterHub scopes:863] Checking access via scope servers
[D 2023-07-14 11:12:49.838 JupyterHub scopes:690] Argument-based access to /hub/spawn via servers
[D 2023-07-14 11:12:49.838 JupyterHub pages:215] Triggering spawn with default options for lonnemanm
[D 2023-07-14 11:12:49.838 JupyterHub base:943] Initiating spawn for lonnemanm
[D 2023-07-14 11:12:49.839 JupyterHub base:947] 0/100 concurrent spawns
[D 2023-07-14 11:12:49.839 JupyterHub base:952] 1 active servers
[I 2023-07-14 11:12:49.859 JupyterHub provider:659] Creating oauth client jupyterhub-user-lonnemanm
[D 2023-07-14 11:12:49.874 JupyterHub user:744] Calling Spawner.start for lonnemanm
23/07/14 11:12:50 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at dl-test-02.cm.cluster/192.168.60.11:8050
23/07/14 11:12:50 INFO client.AHSProxy: Connecting to Application History server at localhost/127.0.0.1:10200
23/07/14 11:12:50 INFO skein.Driver: Uploading application resources to hdfs://dl-test-02.cm.cluster:8020/user/lonnemanm/.skein/application_1687889389675_0041
23/07/14 11:12:50 INFO skein.Driver: Submitting application...
23/07/14 11:12:50 INFO impl.YarnClientImpl: Submitted application application_1687889389675_0041
[I 2023-07-14 11:12:50.841 JupyterHub log:186] 302 GET /hub/spawn -> /hub/spawn-pending/lonnemanm (lonnemanm@::ffff:10.254.127.48) 1006.40ms
[D 2023-07-14 11:12:50.935 JupyterHub scopes:863] Checking access via scope servers
[D 2023-07-14 11:12:50.935 JupyterHub scopes:690] Argument-based access to /hub/spawn-pending/lonnemanm via servers
[I 2023-07-14 11:12:50.936 JupyterHub pages:396] lonnemanm is pending spawn
[I 2023-07-14 11:12:50.943 JupyterHub log:186] 200 GET /hub/spawn-pending/lonnemanm (lonnemanm@::ffff:10.254.127.48) 12.41ms
[D 2023-07-14 11:12:51.099 JupyterHub scopes:863] Checking access via scope read:servers
[D 2023-07-14 11:12:51.099 JupyterHub scopes:690] Argument-based access to /hub/api/users/lonnemanm/server/progress via read:servers
[D 2023-07-14 11:12:58.125 JupyterHub base:286] Recording first activity for <APIToken('1ace...', user='lonnemanm', client_id='jupyterhub')>
[I 2023-07-14 11:12:58.131 JupyterHub log:186] 200 POST /hub/api/yarnspawner (lonnemanm@192.168.60.12) 9.59ms
[I 2023-07-14 11:12:58.137 JupyterHub log:186] 200 GET /hub/api (@192.168.60.12) 0.80ms
[D 2023-07-14 11:12:58.151 JupyterHub scopes:863] Checking access via scope users:activity
[D 2023-07-14 11:12:58.151 JupyterHub scopes:690] Argument-based access to /hub/api/users/lonnemanm/activity via users:activity
[D 2023-07-14 11:12:58.153 JupyterHub users:857] Activity for user lonnemanm: 2023-07-14T15:12:58.071020Z
[D 2023-07-14 11:12:58.153 JupyterHub users:875] Activity on server lonnemanm/: 2023-07-14T15:12:58.071020Z
[I 2023-07-14 11:12:58.157 JupyterHub log:186] 200 POST /hub/api/users/lonnemanm/activity (lonnemanm@192.168.60.12) 9.69ms
[D 2023-07-14 11:12:58.398 JupyterHub spawner:1367] Polling subprocess every 30s
[D 2023-07-14 11:12:58.404 JupyterHub utils:277] Server at http://dl-test-03.cm.cluster:41665/user/lonnemanm/ responded with 302
[D 2023-07-14 11:12:58.405 JupyterHub _version:74] jupyterhub and jupyterhub-singleuser both on version 3.1.1
[I 2023-07-14 11:12:58.405 JupyterHub base:972] User lonnemanm took 8.566 seconds to start
[I 2023-07-14 11:12:58.405 JupyterHub proxy:330] Adding user lonnemanm to proxy /user/lonnemanm/ => http://dl-test-03.cm.cluster:41665
[D 2023-07-14 11:12:58.405 JupyterHub proxy:880] Proxy: Fetching POST http://127.0.0.1:8001/api/routes/user/lonnemanm
11:12:58.407 [ConfigProxy] info: Adding route /user/lonnemanm -> http://dl-test-03.cm.cluster:41665
11:12:58.408 [ConfigProxy] info: 201 POST /api/routes/user/lonnemanm 
[I 2023-07-14 11:12:58.409 JupyterHub users:751] Server lonnemanm is ready
[I 2023-07-14 11:12:58.409 JupyterHub log:186] 200 GET /hub/api/users/lonnemanm/server/progress (lonnemanm@::ffff:10.254.127.48) 7315.30ms
[D 2023-07-14 11:12:58.504 JupyterHub scopes:863] Checking access via scope servers
[D 2023-07-14 11:12:58.504 JupyterHub scopes:690] Argument-based access to /hub/spawn-pending/lonnemanm via servers
[I 2023-07-14 11:12:58.505 JupyterHub log:186] 302 GET /hub/spawn-pending/lonnemanm -> /user/lonnemanm/ (lonnemanm@::ffff:10.254.127.48) 4.32ms

Hello @mahendrapaipuri! I am more or less to the end of my debugging ability here, because I don’t understand the structure of the application (or Python idioms) well enough to understand how control flows amongst the various handlers. Do you think it is worth filing a ticket on the Yarnspawner repo or some other issue tracker to get more help with this? I dimly suspect that the actual fix will not be that complicated, just a matter of bringing Yarnspawner in line with modern Jupyterhub authN design, but it’s just a bit beyond my unaided reach.

Hey, I am not very convinced that the issue is coming from Yarnspawner. Anyways, there has not been any activity on that repo for last 3-4 years, so, I doubt you will get any response from there.

Could you containerise your environment and share the compose files here? This allows others to dig into your issue from easily.