Constantly reconnect to kernels #3368

Bug description

Browser websocket constantly reconnecting to kernel. Got the following logs in browser’s console log.

Starting WebSockets: ws://example.domain/user/taoxu/api/kernels/8eb032df-4ad3-4f4c-97df-a0849b92a949
kernel.js:106 Kernel: kernel_connected (8eb032df-4ad3-4f4c-97df-a0849b92a949)
kernel.js:106 Kernel: kernel_ready (8eb032df-4ad3-4f4c-97df-a0849b92a949)
kernel.js:474 WebSocket closed early CloseEvent {isTrusted: true, wasClean: false, code: 1006, reason: "", type: "close", …}bubbles: falsecancelBubble: falsecancelable: falsecode: 1006composed: falsecurrentTarget: WebSocket {url: "ws://example.domain/user/taoxu/api/kernels…nnels?session_id=a5c692b26e6a418c8dd6a10042a5ffb1", readyState: 3, bufferedAmount: 0, onopen: ƒ, onerror: ƒ, …}defaultPrevented: falseeventPhase: 0isTrusted: truepath: []reason: ""returnValue: truesrcElement: WebSocket {url: "ws://example.domain/user/taoxu/api/kernels…nnels?session_id=a5c692b26e6a418c8dd6a10042a5ffb1", readyState: 3, bufferedAmount: 0, onopen: ƒ, onerror: ƒ, …}target: WebSocket {url: "ws://example.domain/user/taoxu/api/kernels…nnels?session_id=a5c692b26e6a418c8dd6a10042a5ffb1", readyState: 3, bufferedAmount: 0, onopen: ƒ, onerror: ƒ, …}timeStamp: 6077.57500000298type: "close"wasClean: false__proto__: CloseEvent
kernel.js:106 Kernel: kernel_disconnected (8eb032df-4ad3-4f4c-97df-a0849b92a949)
kernel.js:564 Connection lost, reconnecting in 1 seconds.
kernel.js:106 Kernel: kernel_reconnecting (8eb032df-4ad3-4f4c-97df-a0849b92a949)
kernel.js:463 Starting WebSockets: ws://example.domain/user/taoxu/api/kernels/8eb032df-4ad3-4f4c-97df-a0849b92a949
kernel.js:106 Kernel: kernel_connected (8eb032df-4ad3-4f4c-97df-a0849b92a949)
kernel.js:106 Kernel: kernel_ready (8eb032df-4ad3-4f4c-97df-a0849b92a949)
kernel.js:474 

The browsers continues sending api/kernels/KERNEL_ID and api/kernels/KERNEL_ID/channels websocket requests.

api/kernels/KERNEL_ID returns

{"id": "8eb032df-4ad3-4f4c-97df-a0849b92a949", 
"name": "python3", 
"last_activity": "2021-02-22T10:01:57.304252Z", 
"execution_state": "idle", 
"connections": 0
}

Expected behaviour

Websocket connection to kernel should be ok

Actual behaviour

See bug description

How to reproduce

  1. Spawn a server
  2. Open a notebook server
  3. Open console in browser’s develop tools
  4. See error

Your personal set up

  • OS: Ubuntu 20.04
  • Version(s): jupyterhub 1.3.0, python 3.8.5
  • Full environment
bsl-py==0.11.0
alembic==1.5.5
argon2-cffi==20.1.0
async-generator==1.10
attrs==20.3.0
backcall==0.2.0
bleach==3.3.0
certifi==2020.12.5
certipy==0.1.3
cffi==1.14.5
chardet==4.0.0
cryptography==3.4.6
decorator==4.4.2
defusedxml==0.6.0
entrypoints==0.3
future==0.18.2
idna==2.10
ipykernel==5.5.0
ipython==7.20.0
ipython-genutils==0.2.0
jedi==0.18.0
Jinja2==2.11.3
jsonschema==3.2.0
jupyter-client==6.1.11
jupyter-core==4.7.1
jupyter-telemetry==0.1.0
# Editable install with no version control (jupyterhub==1.3.0)
-e /data/app/jupyter
jupyterlab-pygments==0.1.2
Mako==1.1.4
MarkupSafe==1.1.1
mistune==0.8.4
mysqlclient==1.4.4
nbclient==0.5.2
nbconvert==6.0.7
nbformat==5.1.2
nest-asyncio==1.5.1
notebook==6.2.0
oauthlib==3.1.0
packaging==20.9
pamela==1.0.0
pandocfilters==1.4.3
parso==0.8.1
pexpect==4.8.0
pickleshare==0.7.5
prometheus-client==0.9.0
prompt-toolkit==3.0.16
ptyprocess==0.7.0
pycparser==2.20
pycurl==7.43.0.2
Pygments==2.8.0
pyOpenSSL==20.0.1
pyparsing==2.4.7
pyrsistent==0.17.3
python-dateutil==2.8.1
python-discovery-client==0.5
python-editor==1.0.4
python-json-logger==2.0.1
pyzmq==22.0.3
requests==2.25.1
ruamel.yaml==0.16.12
ruamel.yaml.clib==0.2.2
Send2Trash==1.5.0
six==1.15.0
SQLAlchemy==1.3.23
terminado==0.9.2
testpath==0.4.4
tornado==6.1
traitlets==5.0.5
urllib3==1.26.3
wcwidth==0.2.5
webencodings==0.5.1
  • Configuration
# jupyterhub_config.py
import os
import shlex
import socket
from jupyterhub.auth import DummyAuthenticator

from jupyterhub.spawner import SimpleLocalProcessSpawner


sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.connect(('8.8.8.8', 53))
LOCAL_IP = sock.getsockname()[0]
sock.close()

c = get_config() # type: ignore


c.JupyterHub.spawner_class = SimpleLocalProcessSpawner
c.JupyterHub.hub_ip = '0.0.0.0'
c.JupyterHub.hub_connect_ip = LOCAL_IP

c.JupyterHub.authenticator_class = DummyAuthenticator

# Proxy
c.JupyterHub.cleanup_servers = False
c.ConfigurableHTTPProxy.should_start = False
c.ConfigurableHTTPProxy.auth_token = "xxx"
c.ConfigurableHTTPProxy.api_url = 'http://10.70.26.117:8001'

# shutdown the server after no activity for 20 minutes
c.NotebookApp.shutdown_no_activity_timeout = 20 * 60
# shutdown kernels after no activity for 10 minutes
c.MappingKernelManager.cull_idle_timeout = 10 * 60
# cull connected (e.g. Browser tab open)
c.MappingKernelManager.cull_connected=True


  • Logs
I 2021-02-22 18:14:47.228 SingleUserNotebookApp log:181] 200 GET /user/taoxu/api/kernels/8eb032df-4ad3-4f4c-97df-a0849b92a949?_=1613986889864 (taoxu@172.22.21.11) 1.12ms
[I 2021-02-22 18:14:48.287 SingleUserNotebookApp log:181] 101 GET /user/taoxu/api/kernels/8eb032df-4ad3-4f4c-97df-a0849b92a949/channels?session_id=a5c692b26e6a418c8dd6a10042a5ffb1 (taoxu@172.22.21.11) 2.23ms
[I 2021-02-22 18:14:48.287 SingleUserNotebookApp handlers:378] Restoring connection for 8eb032df-4ad3-4f4c-97df-a0849b92a949:a5c692b26e6a418c8dd6a10042a5ffb1
[I 2021-02-22 18:14:49.085 SingleUserNotebookApp kernelmanager:222] Starting buffering for 8eb032df-4ad3-4f4c-97df-a0849b92a949:a5c692b26e6a418c8dd6a10042a5ffb1
[I 2021-02-22 18:14:49.171 SingleUserNotebookApp log:181] 200 GET /user/taoxu/api/kernels/8eb032df-4ad3-4f4c-97df-a0849b92a949?_=1613986889865 (taoxu@172.22.21.3) 1.16ms
[I 2021-02-22 18:14:50.243 SingleUserNotebookApp log:181] 101 GET /user/taoxu/api/kernels/8eb032df-4ad3-4f4c-97df-a0849b92a949/channels?session_id=a5c692b26e6a418c8dd6a10042a5ffb1 (taoxu@172.22.21.103) 2.10ms
[I 2021-02-22 18:14:50.243 SingleUserNotebookApp handlers:378] Restoring connection for 8eb032df-4ad3-4f4c-97df-a0849b92a949:a5c692b26e6a418c8dd6a10042a5ffb1
[I 2021-02-22 18:14:50.337 JupyterHub log:181] 302 GET /metrics -> /hub/metrics (@172.23.20.99) 0.70ms
[I 2021-02-22 18:14:50.350 JupyterHub log:181] 200 GET /hub/metrics (@172.23.20.99) 6.84ms
[I 2021-02-22 18:14:51.039 SingleUserNotebookApp kernelmanager:222] Starting buffering for 8eb032df-4ad3-4f4c-97df-a0849b92a949:a5c692b26e6a418c8dd6a10042a5ffb1
[I 2021-02-22 18:14:51.130 SingleUserNotebookApp log:181] 200 GET /user/taoxu/api/kernels/8eb032df-4ad3-4f4c-97df-a0849b92a949?_=1613986889866 (taoxu@172.22.21.12) 1.25ms
[I 2021-02-22 18:14:52.489 SingleUserNotebookApp log:181] 101 GET /user/taoxu/api/kernels/8eb032df-4ad3-4f4c-97df-a0849b92a949/channels?session_id=a5c692b26e6a418c8dd6a10042a5ffb1 (taoxu@172.22.21.28) 2.03ms
[I 2021-02-22 18:14:52.489 SingleUserNotebookApp handlers:378] Restoring connection for 8eb032df-4ad3-4f4c-97df-a0849b92a949:a5c692b26e6a418c8dd6a10042a5ffb1
[I 2021-02-22 18:14:53.174 JupyterHub log:181] 200 GET /hub/api (@10.70.26.25) 0.82ms
[I 2021-02-22 18:14:53.286 SingleUserNotebookApp kernelmanager:222] Starting buffering for 8eb032df-4ad3-4f4c-97df-a0849b92a949:a5c692b26e6a418c8dd6a10042a5ffb1
[I 2021-02-22 18:14:53.312 JupyterHub log:181] 200 GET /hub/api (@10.70.26.25) 0.66ms
[I 2021-02-22 18:14:53.321 SingleUserNotebookApp log:181] 200 GET /user/taoxu/api/kernels/8eb032df-4ad3-4f4c-97df-a0849b92a949?_=1613986889867 (taoxu@172.22.21.3) 1.04ms
[I 2021-02-22 18:14:54.514 SingleUserNotebookApp log:181] 101 GET /user/taoxu/api/kernels/8eb032df-4ad3-4f4c-97df-a0849b92a949/channels?session_id=a5c692b26e6a418c8dd6a10042a5ffb1 (taoxu@172.22.21.12) 2.17ms
[I 2021-02-22 18:14:54.515 SingleUserNotebookApp handlers:378] Restoring connection for 8eb032df-4ad3-4f4c-97df-a0849b92a949:a5c692b26e6a418c8dd6a10042a5ffb1
[I 2021-02-22 18:14:55.312 SingleUserNotebookApp kernelmanager:222] Starting buffering for 8eb032df-4ad3-4f4c-97df-a0849b92a949:a5c692b26e6a418c8dd6a10042a5ffb1
[I 2021-02-22 18:14:55.424 SingleUserNotebookApp log:181] 200 GET /user/taoxu/api/kernels/8eb032df-4ad3-4f4c-97df-a0849b92a949?_=1613986889868 (taoxu@172.22.21.12) 1.14ms
[I 2021-02-22 18:14:56.551 SingleUserNotebookApp log:181] 101 GET /user/taoxu/api/kernels/8eb032df-4ad3-4f4c-97df-a0849b92a949/channels?session_id=a5c692b26e6a418c8dd6a10042a5ffb1 (taoxu@172.22.21.12) 2.34ms
[I 2021-02-22 18:14:56.551 SingleUserNotebookApp handlers:378] Restoring connection for 8eb032df-4ad3-4f4c-97df-a0849b92a949:a5c692b26e6a418c8dd6a10042a5ffb1
[I 2021-02-22 18:14:57.346 SingleUserNotebookApp kernelmanager:222] Starting buffering for 8eb032df-4ad3-4f4c-97df-a0849b92a949:a5c692b26e6a418c8dd6a10042a5ffb1
[I 2021-02-22 18:14:57.465 SingleUserNotebookApp log:181] 200 GET /user/taoxu/api/kernels/8eb032df-4ad3-4f4c-97df-a0849b92a949?_=1613986889869 (taoxu@172.22.21.28) 1.23ms
[I 2021-02-22 18:14:58.550 SingleUserNotebookApp log:181] 101 GET /user/taoxu/api/kernels/8eb032df-4ad3-4f4c-97df-a0849b92a949/channels?session_id=a5c692b26e6a418c8dd6a10042a5ffb1 (taoxu@172.22.21.3) 2.24ms
[I 2021-02-22 18:14:58.550 SingleUserNotebookApp handlers:378] Restoring connection for 8eb032df-4ad3-4f4c-97df-a0849b92a949:a5c692b26e6a418c8dd6a10042a5ffb1
[I 2021-02-22 18:14:59.347 SingleUserNotebookApp kernelmanager:222] Starting buffering for 8eb032df-4ad3-4f4c-97df-a0849b92a949:a5c692b26e6a418c8dd6a10042a5ffb1
[I 2021-02-22 18:14:59.704 SingleUserNotebookApp log:181] 200 GET /user/taoxu/api/kernels/8eb032df-4ad3-4f4c-97df-a0849b92a949?_=1613986889870 (taoxu@172.22.21.28) 1.09ms
[I 2021-02-22 18:15:01.428 SingleUserNotebookApp log:181] 101 GET /user/taoxu/api/kernels/8eb032df-4ad3-4f4c-97df-a0849b92a949/channels?session_id=a5c692b26e6a418c8dd6a10042a5ffb1 (taoxu@172.22.21.12) 2.46ms
[I 2021-02-22 18:15:01.428 SingleUserNotebookApp handlers:378] Restoring connection for 8eb032df-4ad3-4f4c-97df-a0849b92a949:a5c692b26e6a418c8dd6a10042a5ffb1
[I 2021-02-22 18:15:02.223 SingleUserNotebookApp kernelmanager:222] Starting buffering for 8eb032df-4ad3-4f4c-97df-a0849b92a949:a5c692b26e6a418c8dd6a10042a5ffb1
[I 2021-02-22 18:15:02.255 SingleUserNotebookApp log:181] 200 GET /user/taoxu/api/kernels/8eb032df-4ad3-4f4c-97df-a0849b92a949?_=1613986889871 (taoxu@172.22.21.3) 1.19ms

Access from IP:PORT is correct.
My nginx config:

 location  /
    {
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $server_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_connect_timeout 100ms;
        proxy_read_timeout 1600ms;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_pass http://10.70.26.117:81;
    }

 location ~ /api/kernels
    {
        proxy_http_version    1.1;
        proxy_buffering off;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $server_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_connect_timeout 100ms;
        proxy_read_timeout 1600ms;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_pass http://10.70.26.117:81;    
    }


Check out our example nginx configuration. In particular, you should not specify which paths might have websockets (no need for separate /api/kernels location config), which is probably not exactly matching all (and only) the websocket endpoints.

Thanks. I used / before, but the same problem occurs. In my situation, configurable-http-proxy runs on a second host. Does that make a difference?

It shouldn’t. The key bit of config is this:

map $http_upgrade $connection_upgrade {
    default upgrade;
    ''      close;
}
...
        # websocket headers
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
        proxy_set_header X-Scheme $scheme;

Without the map configuration, it will get the wrong headers for websocket endpoints.

1 Like

/etc/nginx/nginx.conf

user  nginx;
worker_processes  1;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

events {
    worker_connections  1024;
}

http {  
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;
        
    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                      '$status $body_bytes_sent "$http_referer" '
                      '"$http_user_agent" "$http_x_forwarded_for"';
    
    access_log  /var/log/nginx/access.log  main;
        
    sendfile        on;
    #tcp_nopush     on;
        
    keepalive_timeout  65;
        
    #gzip  on;
    # my config
    map $http_upgrade $connection_upgrade {
        default upgrade; 
        '' close;
    }

    include /etc/nginx/conf.d/*.conf;
}

/etc/nginx/conf.d/jupyter.conf

server {                                                                                                                                                                     
    server_name 10.70.68.11;
    #server_name jupyter.bilibili.co;
    listen 80;


    location  /
    {
        access_log /tmp/ng_access.log  main;
        error_log /tmp/ng_error_log debug;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $server_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_connect_timeout 100ms;
        proxy_read_timeout 1600ms;
        proxy_http_version 1.1;
        proxy_pass http://10.70.26.117:81;
    }

    location ~ /api/kernels
    {
        access_log /tmp/ng_access.log  main;
        error_log /tmp/ng_error_log debug;
        proxy_http_version    1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $server_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Scheme $scheme;
        proxy_connect_timeout 200ms;
        proxy_read_timeout 1600ms;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
        proxy_pass http://10.70.26.117:81;
    }

/tmp/ng_error_log

2021/02/23 20:06:11 [error] 259192#0: *901 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 172.22.123.13, server: 10.70.68.11, request: "GET /user/taoxu/api/kernels/05ea25f3-9df6-495f-b8e7-5b2a85194759/channels?session_id=ddd5392161a946e78ab6f26a588989b1 HTTP/1.1", upstream: "http://10.70.26.117:81/user/taoxu/api/kernels/05ea25f3-9df6-495f-b8e7-5b2a85194759/channels?session_id=ddd5392161a946e78ab6f26a588989b1", host: "10.70.68.11"
2021/02/23 20:06:11 [info] 259192#0: *901 client 172.22.123.13 closed keepalive connection

I can curl the upstream on nginx host without timeout

I changed configurable-http-proxy log level to debug. But there is no timeout. Instead, there are many unnecessary WS proxy.

I0224 14:12:03.716489 140065249699648 proxy_main.py:93] 14:12:03.716 [ConfigProxy] debug: PROXY WEB /user/taoxu/nbextensions/widgets/notebook/js/extension.js?v=20210224120705 to http://10.70.26.25:37841
I0224 14:12:03.763930 140065249699648 proxy_main.py:93] 14:12:03.763 [ConfigProxy] debug: PROXY WS /user/taoxu/api/kernels/a0d25b21-4fcc-4f3a-b4b1-418f86c2850b/channels?session_id=197125455dcc42c789f7ec5df311f94d to http://10.70.26.25:37841
I0224 14:12:06.425019 140065249699648 proxy_main.py:93] 14:12:06.424 [ConfigProxy] debug: PROXY WS /user/taoxu/api/kernels/a0d25b21-4fcc-4f3a-b4b1-418f86c2850b/channels?session_id=197125455dcc42c789f7ec5df311f94d to http://10.70.26.25:37841
I0224 14:12:09.086625 140065249699648 proxy_main.py:93] 14:12:09.086 [ConfigProxy] debug: PROXY WS /user/taoxu/api/kernels/a0d25b21-4fcc-4f3a-b4b1-418f86c2850b/channels?session_id=197125455dcc42c789f7ec5df311f94d to http://10.70.26.25:37841
I0224 14:12:10.439031 140065249699648 proxy_main.py:93] 14:12:10.438 [ConfigProxy] debug: PROXY WEB /hub/api to http://10.70.26.25:8081

proxy_read_timeout changed to 60s. No reconnecting issue. But I do not know the root cause

Remember to make sure that you remove the location ~ /api/kernels section. That is not a correct way to handle websocket requests.

Yes. That’s for test only. I have changed it to the following.

location ~* /(api/kernels/[^/]+/(channels|iopub|shell|stdin)|terminals/websocket)/?
{
        access_log /tmp/ng_access.log  main;
        error_log /tmp/ng_error_log debug;
        proxy_http_version    1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $server_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Scheme $scheme;
        proxy_connect_timeout 200ms;
        proxy_read_timeout 60;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection 'upgrade';
        #keepalive_timeout  120s 120s;
        proxy_pass http://jupyter_proxy;
 }

Glad it’s working! I’ll just reiterate that it should not need to have any pattern matching at all. A single location block for / is sufficient for all websocket and non-websocket destinations, as seen in the linked example above.

Yes. A single / works. Just want to have fine grained timeout control on different URL :smiley: