Bug description
We are using Jupyterhub with a custom spawner that spawns notebook on AWS ECS cluster. We are seeing multiple instances of jupyter notebook becoming unusable(Not able to save files/performance degradation) or 504 exceptions while accessing the notebook
I had even increased the proxy timeout but the issue is still occurring
# Proxy config
c.ConfigurableHTTPProxy.command = ['configurable-http-proxy',
'--log-level', 'debug',
'--timeout', '300000', # 5 minutes
'--proxy-timeout', '300000', # 5 minutes
'--statsd-host', 'statsd',
'--statsd-port', '9125',
'--statsd-prefix', 'chp']
Actual behaviour
Notebook debug logs
[D 2021-12-10 11:20:51.423 SingleUserNotebookApp filemanager:469] Saving /home/jovyan/jfs/keras_test.ipynb
[I 2021-12-10 11:22:19.932 SingleUserNotebookApp log:174] 200 PUT /user/ayush.chauhan/api/contents/keras_test.ipynb?1639135251107 (ayush.chauhan@42.104.69.45) 88510.67ms
[W 2021-12-10 11:22:19.935 SingleUserNotebookApp zmqhandlers:253] zmq message arrived on closed channel
[W 2021-12-10 11:22:19.936 SingleUserNotebookApp zmqhandlers:253] zmq message arrived on closed channel
[D 2021-12-10 11:22:19.936 SingleUserNotebookApp handlers:555] Websocket closed cfb6ea5c-6ed3-4553-a0b2-55902e13865d:1863d368-dbaa-4063-aabd-1fc4df8d8cb5
[D 2021-12-10 11:22:19.936 SingleUserNotebookApp handlers:555] Websocket closed cfb6ea5c-6ed3-4553-a0b2-55902e13865d:eced1597-c8f3-409c-a93c-2200fd7c4c01
[I 2021-12-10 11:22:19.936 SingleUserNotebookApp kernelmanager:222] Starting buffering for cfb6ea5c-6ed3-4553-a0b2-55902e13865d:eced1597-c8f3-409c-a93c-2200fd7c4c01
[D 2021-12-10 11:22:19.936 SingleUserNotebookApp kernelmanager:272] Clearing buffer for cfb6ea5c-6ed3-4553-a0b2-55902e13865d
[D 2021-12-10 11:22:19.937 SingleUserNotebookApp auth:310] HubAuth cache miss: token:2b1fc732711749e896b5c7a99af9a8b0:f6afbe1da270476f8731f3d18400874c
[D 2021-12-10 11:22:19.975 SingleUserNotebookApp auth:316] Received request from Hub user {'kind': 'user', 'name': 'ayush.chauhan', 'admin': True, 'groups': ['ml-ds', 'development'], 'server': '/user/ayush.chauhan/', 'pending': None, 'created': '2020-07-01T10:35:26Z', 'last_activity': '2021-12-10T11:22:19.953055Z', 'servers': None}
[D 2021-12-10 11:22:19.975 SingleUserNotebookApp auth:857] Allowing Hub admin ayush.chauhan
[D 2021-12-10 11:22:19.975 SingleUserNotebookApp auth:744] Setting oauth cookie for 42.104.69.45: jupyterhub-user-ayush.chauhan, {'path': '/user/ayush.chauhan/', 'httponly': True}
[D 2021-12-10 11:22:19.976 SingleUserNotebookApp zmqhandlers:293] Initializing websocket connection /user/ayush.chauhan/api/kernels/cfb6ea5c-6ed3-4553-a0b2-55902e13865d/channels
[D 2021-12-10 11:22:19.977 SingleUserNotebookApp auth:857] Allowing Hub admin ayush.chauhan
[D 2021-12-10 11:22:19.977 SingleUserNotebookApp auth:744] Setting oauth cookie for 42.104.69.45: jupyterhub-user-ayush.chauhan, {'path': '/user/ayush.chauhan/', 'httponly': True}
[W 2021-12-10 11:22:19.978 SingleUserNotebookApp zmqstream:442] Got events for closed stream <zmq.eventloop.zmqstream.ZMQStream object at 0x7f20235dafd0>
[D 2021-12-10 11:22:19.978 SingleUserNotebookApp kernelmanager:235] Buffering msg on cfb6ea5c-6ed3-4553-a0b2-55902e13865d:iopub
[D 2021-12-10 11:22:19.978 SingleUserNotebookApp kernelmanager:235] Buffering msg on cfb6ea5c-6ed3-4553-a0b2-55902e13865d:iopub
[I 2021-12-10 11:22:19.978 SingleUserNotebookApp handlers:164] Saving file at /keras_test.ipynb
[D 2021-12-10 11:22:19.978 SingleUserNotebookApp filemanager:469] Saving /home/jovyan/jfs/keras_test.ipynb
[I 2021-12-10 11:22:20.071 SingleUserNotebookApp log:174] 200 PUT /user/ayush.chauhan/api/contents/keras_test.ipynb?1639135251107 (ayush.chauhan@42.104.69.45) 133.96ms
[D 2021-12-10 11:22:20.071 SingleUserNotebookApp auth:857] Allowing Hub admin ayush.chauhan
[I 2021-12-10 11:22:20.072 SingleUserNotebookApp log:174] 200 GET /user/ayush.chauhan/api/terminals?1639135315307 (ayush.chauhan@42.104.69.45) 95.35ms
[D 2021-12-10 11:22:20.072 SingleUserNotebookApp singleuser:503] Notifying Hub of activity 2021-12-10T11:22:19.934928Z
[D 2021-12-10 11:22:20.073 SingleUserNotebookApp kernelmanager:235] Buffering msg on cfb6ea5c-6ed3-4553-a0b2-55902e13865d:iopub
[D 2021-12-10 11:22:20.073 SingleUserNotebookApp kernelmanager:235] Buffering msg on cfb6ea5c-6ed3-4553-a0b2-55902e13865d:iopub
[D 2021-12-10 11:22:20.074 SingleUserNotebookApp kernelmanager:235] Buffering msg on cfb6ea5c-6ed3-4553-a0b2-55902e13865d:iopub
[D 2021-12-10 11:22:20.074 SingleUserNotebookApp kernelmanager:235] Buffering msg on cfb6ea5c-6ed3-4553-a0b2-55902e13865d:iopub
[D 2021-12-10 11:22:20.074 SingleUserNotebookApp kernelmanager:235] Buffering msg on cfb6ea5c-6ed3-4553-a0b2-55902e13865d:iopub
[D 2021-12-10 11:22:20.075 SingleUserNotebookApp kernelmanager:235] Buffering msg on cfb6ea5c-6ed3-4553-a0b2-55902e13865d:iopub
[D 2021-12-10 11:22:20.075 SingleUserNotebookApp kernelmanager:235] Buffering msg on cfb6ea5c-6ed3-4553-a0b2-55902e13865d:iopub
[D 2021-12-10 11:22:20.075 SingleUserNotebookApp kernelmanager:235] Buffering msg on cfb6ea5c-6ed3-4553-a0b2-55902e13865d:iopub
[I 2021-12-10 11:22:20.076 SingleUserNotebookApp log:174] 101 GET /user/ayush.chauhan/api/kernels/cfb6ea5c-6ed3-4553-a0b2-55902e13865d/channels?session_id=1863d368-dbaa-4063-aabd-1fc4df8d8cb5 (ayush.chauhan@125.19.104.6) 100.30ms
[D 2021-12-10 11:22:20.076 SingleUserNotebookApp zmqhandlers:154] Opening websocket /user/ayush.chauhan/api/kernels/cfb6ea5c-6ed3-4553-a0b2-55902e13865d/channels
[D 2021-12-10 11:22:20.076 SingleUserNotebookApp kernelmanager:252] Getting buffer for cfb6ea5c-6ed3-4553-a0b2-55902e13865d
[D 2021-12-10 11:22:20.076 SingleUserNotebookApp kernelmanager:272] Clearing buffer for cfb6ea5c-6ed3-4553-a0b2-55902e13865d
[I 2021-12-10 11:22:20.076 SingleUserNotebookApp kernelmanager:287] Discarding 10 buffered messages for cfb6ea5c-6ed3-4553-a0b2-55902e13865d:eced1597-c8f3-409c-a93c-2200fd7c4c01
[D 2021-12-10 11:22:20.076 SingleUserNotebookApp connect:547] Connecting to: tcp://127.0.0.1:34887
[D 2021-12-10 11:22:20.076 SingleUserNotebookApp connect:547] Connecting to: tcp://127.0.0.1:42949
[D 2021-12-10 11:22:20.076 SingleUserNotebookApp connect:547] Connecting to: tcp://127.0.0.1:53383
[D 2021-12-10 11:22:20.077 SingleUserNotebookApp connect:547] Connecting to: tcp://127.0.0.1:40155
[D 2021-12-10 11:22:20.077 SingleUserNotebookApp handlers:151] Nudge: not nudging busy kernel cfb6ea5c-6ed3-4553-a0b2-55902e13865d
[W 2021-12-10 11:22:20.077 SingleUserNotebookApp zmqstream:442] Got events for closed stream <zmq.eventloop.zmqstream.ZMQStream object at 0x7f20234c2710>
[W 2021-12-10 11:22:20.077 SingleUserNotebookApp zmqstream:442] Got events for closed stream <zmq.eventloop.zmqstream.ZMQStream object at 0x7f20234c2710>
[D 2021-12-10 11:22:20.416 SingleUserNotebookApp zmqhandlers:293] Initializing websocket connection /user/ayush.chauhan/api/kernels/cfb6ea5c-6ed3-4553-a0b2-55902e13865d/channels
[D 2021-12-10 11:22:20.417 SingleUserNotebookApp auth:857] Allowing Hub admin ayush.chauhan
[I 2021-12-10 11:22:20.418 SingleUserNotebookApp log:174] 101 GET /user/ayush.chauhan/api/kernels/cfb6ea5c-6ed3-4553-a0b2-55902e13865d/channels?session_id=eced1597-c8f3-409c-a93c-2200fd7c4c01 (ayush.chauhan@42.104.69.45) 1.80ms
[D 2021-12-10 11:22:20.418 SingleUserNotebookApp zmqhandlers:154] Opening websocket /user/ayush.chauhan/api/kernels/cfb6ea5c-6ed3-4553-a0b2-55902e13865d/channels
[D 2021-12-10 11:22:20.418 SingleUserNotebookApp kernelmanager:252] Getting buffer for cfb6ea5c-6ed3-4553-a0b2-55902e13865d
[D 2021-12-10 11:22:20.418 SingleUserNotebookApp connect:547] Connecting to: tcp://127.0.0.1:34887
[D 2021-12-10 11:22:20.418 SingleUserNotebookApp connect:547] Connecting to: tcp://127.0.0.1:42949
[D 2021-12-10 11:22:20.418 SingleUserNotebookApp connect:547] Connecting to: tcp://127.0.0.1:53383
[D 2021-12-10 11:22:20.418 SingleUserNotebookApp connect:547] Connecting to: tcp://127.0.0.1:40155
[D 2021-12-10 11:22:20.419 SingleUserNotebookApp handlers:151] Nudge: not nudging busy kernel cfb6ea5c-6ed3-4553-a0b2-55902e13865d
[W 2021-12-10 11:22:19.935 SingleUserNotebookApp zmqhandlers:253] zmq message arrived on closed channel
[W 2021-12-10 11:22:19.936 SingleUserNotebookApp zmqhandlers:253] zmq message arrived on closed channel
[D 2021-12-10 11:22:19.936 SingleUserNotebookApp handlers:555] Websocket closed cfb6ea5c-6ed3-4553-a0b2-55902e13865d:1863d368-dbaa-4063-aabd-1fc4df8d8cb5
[D 2021-12-10 11:22:19.936 SingleUserNotebookApp handlers:555] Websocket closed cfb6ea5c-6ed3-4553-a0b2-55902e13865d:eced1597-c8f3-409c-a93c-2200fd7c4c01
[I 2021-12-10 11:22:19.936 SingleUserNotebookApp kernelmanager:222] Starting buffering for cfb6ea5c-6ed3-4553-a0b2-55902e13865d:eced1597-c8f3-409c-a93c-2200fd7c4c01
[D 2021-12-10 11:22:19.936 SingleUserNotebookApp kernelmanager:272] Clearing buffer for cfb6ea5c-6ed3-4553-a0b2-55902e13865d
[D 2021-12-10 11:22:19.937 SingleUserNotebookApp auth:310] HubAuth cache miss: token:2b1fc732711749e896b5c7a99af9a8b0:f6afbe1da270476f8731f3d18400874c
[D 2021-12-10 11:22:19.975 SingleUserNotebookApp auth:316] Received request from Hub user {'kind': 'user', 'name': 'ayush.chauhan', 'admin': True, 'groups': ['ml-ds', 'development'], 'server': '/user/ayush.chauhan/', 'pending': None, 'created': '2020-07-01T10:35:26Z', 'last_activity': '2021-12-10T11:22:19.953055Z', 'servers': None}
[I 2021-12-10 11:22:20.076 SingleUserNotebookApp log:174] 101 GET /user/ayush.chauhan/api/kernels/cfb6ea5c-6ed3-4553-a0b2-55902e13865d/channels?session_id=1863d368-dbaa-4063-aabd-1fc4df8d8cb5 (ayush.chauhan@125.19.104.6) 100.30ms
[D 2021-12-10 11:22:20.076 SingleUserNotebookApp zmqhandlers:154] Opening websocket /user/ayush.chauhan/api/kernels/cfb6ea5c-6ed3-4553-a0b2-55902e13865d/channels
[D 2021-12-10 11:22:20.076 SingleUserNotebookApp kernelmanager:252] Getting buffer for cfb6ea5c-6ed3-4553-a0b2-55902e13865d
[D 2021-12-10 11:22:20.076 SingleUserNotebookApp kernelmanager:272] Clearing buffer for cfb6ea5c-6ed3-4553-a0b2-55902e13865d
[I 2021-12-10 11:22:20.076 SingleUserNotebookApp kernelmanager:287] Discarding 10 buffered messages for cfb6ea5c-6ed3-4553-a0b2-55902e13865d:eced1597-c8f3-409c-a93c-2200fd7c4c01
[D 2021-12-10 11:22:20.076 SingleUserNotebookApp connect:547] Connecting to: tcp://127.0.0.1:34887
[D 2021-12-10 11:22:20.076 SingleUserNotebookApp connect:547] Connecting to: tcp://127.0.0.1:42949
[D 2021-12-10 11:22:20.076 SingleUserNotebookApp connect:547] Connecting to: tcp://127.0.0.1:53383
[D 2021-12-10 11:22:20.077 SingleUserNotebookApp connect:547] Connecting to: tcp://127.0.0.1:40155
[D 2021-12-10 11:22:20.077 SingleUserNotebookApp handlers:151] Nudge: not nudging busy kernel cfb6ea5c-6ed3-4553-a0b2-55902e13865d
[D 2021-12-10 11:22:20.077 SingleUserNotebookApp kernelmanager:425] activity on cfb6ea5c-6ed3-4553-a0b2-55902e13865d: update_display_data
[W 2021-12-10 11:22:20.077 SingleUserNotebookApp zmqstream:442] Got events for closed stream <zmq.eventloop.zmqstream.ZMQStream object at 0x7f20234c2710>
[W 2021-12-10 11:22:20.077 SingleUserNotebookApp zmqstream:442] Got events for closed stream <zmq.eventloop.zmqstream.ZMQStream object at 0x7f20234c2710>
Exception in callback ZMQChannelsHandler.open.<locals>.subscribe(<Future finished result=None>) at /opt/conda/lib/python3.7/site-packages/notebook/services/kernels/handlers.py:409
handle: <Handle ZMQChannelsHandler.open.<locals>.subscribe(<Future finished result=None>) at /opt/conda/lib/python3.7/site-packages/notebook/services/kernels/handlers.py:409>
Traceback (most recent call last):
File "/opt/conda/lib/python3.7/asyncio/events.py", line 88, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/lib/python3.7/site-packages/notebook/services/kernels/handlers.py", line 411, in subscribe
stream.on_recv_stream(self._on_zmq_reply)
File "/opt/conda/lib/python3.7/site-packages/zmq/eventloop/zmqstream.py", line 189, in on_recv_stream
self.on_recv(lambda msg: callback(self, msg), copy=copy)
File "/opt/conda/lib/python3.7/site-packages/zmq/eventloop/zmqstream.py", line 168, in on_recv
self._check_closed()
File "/opt/conda/lib/python3.7/site-packages/zmq/eventloop/zmqstream.py", line 503, in _check_closed
raise IOError("Stream is closed")
OSError: Stream is closed
Hub Logs
11:27:32.928 [ConfigProxy] error: 503 GET /user/ayush.chauhan/api/contents/keras_test.ipynb socket hang up
How to reproduce
Sample script used to reproduce the issue. If you run this script 2-3 times, notebook become unusable
import joblib
import numpy as np
import pandas as pd
def get_parallel_func(df, i):
df = df.reset_index(drop=True)
temp_arr = np.zeros((df.shape[0], 10, 17), np.float32)
for j in range(df.shape[0]):
temp_arr[j, :, :] = np.full((10, 17), 4.556789)*np.full((10, 17), 14.556792)
np.save(f'test/item_{i}.npy', temp_arr)
def get_item_level_embeddings(df):
m = (df.shape[0]//1000000)+1
print('df shape: ', df.shape[0], ' and m is: ', m)
print('Applying a function in df in a parallel manner..')
joblib.Parallel(n_jobs=16)(joblib.delayed(get_parallel_func)(df.loc[i*1000000:((i+1)*1000000)-1], i) for i in range(m))
for i in range(m):
if i==0:
arr = np.load(f'test/item_{i}.npy')
else:
temp_arr = np.load(f'test/item_{i}.npy')
arr = np.vstack([arr, temp_arr])
print(f'array shape: {arr.shape}')
np.save('arr.npy', arr)
df = pd.DataFrame({'col_a': range(20000000)})
df['col_b'] = list(range(20000000))
get_item_level_embeddings(df)
Your personal set up
- OS: ubuntu 20.04
- Jupyter Hub parent docker image: jupyterhub/jupyterhub:1.4.1
- Jupyter Notebook parent docker image: jupyter/base-notebook:hub-1.4.1
- Version(s): 1.4.1
Jupyter Configuration
import requests
import sys
import os
from jupyterhub.auth import DummyAuthenticator
from oauthenticator.google import GoogleOAuthenticator
def get_host_ip():
url = "http://169.254.169.254/latest/meta-data/local-ipv4"
payload = {}
headers = {}
try:
response = requests.request("GET", url, headers=headers, data=payload, timeout=1)
host_ip = response.text
except requests.exceptions.ConnectTimeout:
host_ip = "127.0.0.1"
return host_ip
c = get_config()
# c.JupyterHub.authenticator_class = DummyAuthenticator
c.JupyterHub.authenticator_class = GoogleOAuthenticator
c.GoogleOAuthenticator.hosted_domain = ['domain.com']
c.Authenticator.admin_users = {"ayush.chauhan"}
# Custom spawner config
c.JupyterHub.spawner_class = "ecsspawner.EcsTaskSpawner"
c.Spawner.environment = {'JUPYTER_ENABLE_LAB': 'yes'}
c.Spawner.available_kernels = {}
c.Spawner.available_notebook = {}
c.Spawner.available_notebook_task_roles = {}
# Timeout (in seconds) before giving up on a spawned HTTP server
c.Spawner.http_timeout = 600
c.Spawner.start_timeout = 600
c.Spawner.default_url = '/lab'
# The ip address for the Hub process to *bind* to.
c.JupyterHub.hub_ip = "0.0.0.0"
# IP address that other services(spawn notebooks) should use to connect to the Hub
c.JupyterHub.hub_connect_ip = get_host_ip()
# Disabling prometheus authentication
c.JupyterHub.authenticate_prometheus = False
# Shuts down all user servers on logout
c.JupyterHub.shutdown_on_logout = True
# Proxy config
c.ConfigurableHTTPProxy.command = ['configurable-http-proxy',
'--log-level', 'debug',
'--timeout', '300000', # 5 minutes
'--proxy-timeout', '300000', # 5 minutes
'--statsd-host', 'statsd',
'--statsd-port', '9125',
'--statsd-prefix', 'chp']
# cull_idle service to handle inactive notebooks
c.JupyterHub.services = [
{
"name": "cull-idle",
"admin": True,
"command": [sys.executable,
"/etc/jupyterhub/cull_idle_servers.py",
f"--timeout={os.getenv('JUPYTERHUB_CULL_IDLE_TIMEOUT')}",
f"--max_age={os.getenv('JUPYTERHUB_CULL_IDLE_MAX_AGE')}",
"--url=http://127.0.0.1:8081/hub/api"]
}
]
c.JupyterHub.admin_access = True
# Path to SSL certificate file for the public facing interface of the proxy
# When setting this, you should also set ssl_key
c.JupyterHub.ssl_cert = ''
# Path to SSL key file for the public facing interface of the proxy
# When setting this, you should also set ssl_cert
c.JupyterHub.ssl_key = ''
# Allow named single-user servers per user
c.JupyterHub.allow_named_servers = False
# Will retrieve from ECS env through SSM
mysql_pass = os.getenv('JUPYTERHUB_MYSQL_PASSWORD')
mysql_host = os.getenv('JUPYTERHUB_MYSQL_HOST')
mysql_user = os.getenv('JUPYTERHUB_MYSQL_USER')
mysql_db = os.getenv('JUPYTERHUB_MYSQL_DB')
c.JupyterHub.db_url = f"mysql+mysqlconnector://{mysql_user}:{mysql_pass}@{mysql_host}:3306/{mysql_db}"