Random failure to connect when initializing or restarting a notebook kernel

A few days ago, I started having major issues with notebooks. Half the time, I can open a notebook just fine, and the kernel will initialize, connect, and then display “idle”. The other half, it will say “initializing”, then “connecting”, and either stay at that point forever or move on to “unknown”. Either way, I cannot run cells. Meanwhile the server messages contain a lot of “Replacing stale connection” and “Nudge: attempt xx”. Note that I can wait any amount of time in these situations, and it will never connect, and yet manually shutting down the kernel and launching a new one finishes in a few seconds (if it works). It’s just chance. I also never have problems connecting to a running notebook after disconnecting. But this suggests my internet connection is not the real culprit.

[I 2021-09-03 00:30:27.475 ServerApp] Kernel started: afd46c7f-1ac1-4e23-adb6-6b6be9747f01
/fast/jamesn8/anaconda3/envs/torch3/lib/python3.9/json/encoder.py:257: UserWarning: date_default is deprecated since jupyter_client 7.0.0. Use jupyter_client.jsonutil.json_default.
  return _iterencode(o, 0)
[W 2021-09-03 00:32:47.882 ServerApp] Notebook Mechanical/mechanical/TestNetwork.ipynb is not trusted
[I 2021-09-03 00:32:55.766 ServerApp] Kernel started: a7131b3d-54cf-4da7-acf7-44aa93807db2
[W 2021-09-03 00:33:16.269 ServerApp] Replacing stale connection: a7131b3d-54cf-4da7-acf7-44aa93807db2:15eaca3b-b97c-4214-957f-79652f5b84a1
[W 2021-09-03 00:33:37.276 ServerApp] Replacing stale connection: a7131b3d-54cf-4da7-acf7-44aa93807db2:15eaca3b-b97c-4214-957f-79652f5b84a1
[W 2021-09-03 00:33:56.043 ServerApp] Timeout waiting for kernel_info reply from a7131b3d-54cf-4da7-acf7-44aa93807db2
[I 2021-09-03 00:33:56.546 ServerApp] Starting buffering for a7131b3d-54cf-4da7-acf7-44aa93807db2:15eaca3b-b97c-4214-957f-79652f5b84a1
[I 2021-09-03 00:33:56.547 ServerApp] Restoring connection for a7131b3d-54cf-4da7-acf7-44aa93807db2:15eaca3b-b97c-4214-957f-79652f5b84a1
[W 2021-09-03 00:34:01.058 ServerApp] Nudge: attempt 10 on kernel a7131b3d-54cf-4da7-acf7-44aa93807db2
[W 2021-09-03 00:34:06.069 ServerApp] Nudge: attempt 20 on kernel a7131b3d-54cf-4da7-acf7-44aa93807db2
[W 2021-09-03 00:34:11.082 ServerApp] Nudge: attempt 30 on kernel a7131b3d-54cf-4da7-acf7-44aa93807db2
[W 2021-09-03 00:34:16.094 ServerApp] Nudge: attempt 40 on kernel a7131b3d-54cf-4da7-acf7-44aa93807db2
[W 2021-09-03 00:34:21.105 ServerApp] Nudge: attempt 50 on kernel a7131b3d-54cf-4da7-acf7-44aa93807db2
[W 2021-09-03 00:34:26.117 ServerApp] Nudge: attempt 60 on kernel a7131b3d-54cf-4da7-acf7-44aa93807db2
[W 2021-09-03 00:34:31.126 ServerApp] Nudge: attempt 70 on kernel a7131b3d-54cf-4da7-acf7-44aa93807db2
[W 2021-09-03 00:34:36.137 ServerApp] Nudge: attempt 80 on kernel a7131b3d-54cf-4da7-acf7-44aa93807db2
[W 2021-09-03 00:34:41.147 ServerApp] Nudge: attempt 90 on kernel a7131b3d-54cf-4da7-acf7-44aa93807db2
[W 2021-09-03 00:34:46.158 ServerApp] Nudge: attempt 100 on kernel a7131b3d-54cf-4da7-acf7-44aa93807db2
[W 2021-09-03 00:34:51.169 ServerApp] Nudge: attempt 110 on kernel a7131b3d-54cf-4da7-acf7-44aa93807db2
[W 2021-09-03 00:34:56.183 ServerApp] Nudge: attempt 120 on kernel a7131b3d-54cf-4da7-acf7-44aa93807db2
[E 2021-09-03 00:34:56.549 ServerApp] Uncaught exception GET /api/kernels/a7131b3d-54cf-4da7-acf7-44aa93807db2/channels?session_id=15eaca3b-b97c-4214-957f-79652f5b84a1 (::1)
    HTTPServerRequest(protocol='http', host='localhost:8089', method='GET', uri='/api/kernels/a7131b3d-54cf-4da7-acf7-44aa93807db2/channels?session_id=15eaca3b-b97c-4214-957f-79652f5b84a1', version='HTTP/1.1', remote_ip='::1')
    Traceback (most recent call last):
      File "/fast/jamesn8/anaconda3/envs/torch3/lib/python3.9/site-packages/tornado/websocket.py", line 956, in _accept_connection
        await open_result
    tornado.util.TimeoutError: Timeout
[W 2021-09-03 00:34:57.736 ServerApp] Replacing stale connection: a7131b3d-54cf-4da7-acf7-44aa93807db2:15eaca3b-b97c-4214-957f-79652f5b84a1
[W 2021-09-03 00:35:18.816 ServerApp] Replacing stale connection: a7131b3d-54cf-4da7-acf7-44aa93807db2:15eaca3b-b97c-4214-957f-79652f5b84a1

Until I finally am forced to give up. Sometimes shutting down the kernel and starting it again manually works. But this is incredibly frustrating, and I want to know what I can change to get Jupyter notebooks working again.

The versions of software I am using are

jupyter core     : 4.7.1
jupyter-notebook : 6.4.3
qtconsole        : not installed
ipython          : 7.26.0
ipykernel        : 6.2.0
jupyter client   : 7.0.1
jupyter lab      : 3.1.9
nbconvert        : 6.1.0
ipywidgets       : 7.6.3
nbformat         : 5.1.3
traitlets        : 5.0.5
tornado       : 6.1