Jupyterhub seemingly random ExitCode=134 disconnections

we have some problem with our jupyterhub server, some of our users (not all) report that once in a while, they are greeted to the Server not running screen. It seems like a random occurance, that happens even when they don’t really do anything in their lab instance. The only relevant thing from log which I can see is the exitcode=134 which happens after checking routes.

Our jupyterhub is on version 1.1.0 with dockerspawner

JH config:
c.JupyterHub.authenticator_class = ‘nativeauthenticator.NativeAuthenticator’
c.JupyterHub.bind_url = ‘https://:443’
c.JupyterHub.spawner_class = ‘dockerspawner.DockerSpawner’
c.DockerSpawner.debug = True
c.DockerSpawner.use_internal_ip = True
c.DockerSpawner.network_name = ‘jupyter’

c.DockerSpawner.extra_host_config = { "network_mode": "jupyter" }

c.JupyterHub.hub_ip = '0.0.0.0'  # listen on all interfaces
c.Authenticator.admin_users = {'admins'}

c.Authenticator.whitelist = set()
notebook_dir = '/home/jovyan'
transfer_dir =  '/home/jovyan/transfer'
team_dir = '/home/jovyan/team-data'
project_dir = '/home/jovyan/project-data'
c.DockerSpawner.notebook_dir = notebook_dir
c.DockerSpawner.volumes = {
    '/opt/datascience/transfer': transfer_dir,
    '/opt/datascience/teams': team_dir,
    '/opt/datascience/projects': project_dir
    }
c.DockerSpawner.image = 't1_spawner' #custom image from data-science notebook
c.Spawner.cmd = ['start-notebook.sh']
c.JupyterHub.ssl_key = "/srv/jupyterhub/ssl/key.pem"
c.JupyterHub.ssl_cert = "/srv/jupyterhub/ssl/cert.pem"

Log from JH:

09:56:20.686 [ConfigProxy] [32minfo[39m: 200 GET /api/routes 
[I 2021-02-14 09:56:20.750 JupyterHub proxy:319] Checking routes
[W 2021-02-14 09:56:22.683 JupyterHub base:962] User some_user server stopped, with exit code: ExitCode=134, Error='', FinishedAt=2021-02-14T08:56:14.219826903Z
[I 2021-02-14 09:56:22.683 JupyterHub proxy:281] Removing user some_user from proxy (/user/some_user/)
09:56:22.696 [ConfigProxy] [32minfo[39m: Removing route /user/some_user
09:56:22.697 [ConfigProxy] [32minfo[39m: 204 DELETE /api/routes/user/some_user 
[I 2021-02-14 09:56:35.114 JupyterHub log:174] 302 GET /user/some_user/api/kernelspecs?1611651394775 -> /hub/user/some_user/api/kernelspecs?1611651394775 (@::ffff:10.29.195.101) 3.65ms
[I 2021-02-14 09:56:35.378 JupyterHub log:174] 302 GET /hub/user/some_user/api/kernelspecs?1611651394775 -> /hub/login?next=%2Fhub%2Fuser%2Fsome_user%2Fapi%2Fkernelspecs%3F1611651394775 (@::ffff:10.29.195.101) 2.32ms
[I 2021-02-14 09:56:35.497 JupyterHub log:174] 302 GET /user/some_user/api/kernels?1611651395276 -> /hub/user/some_user/api/kernels?1611651395276 (@::ffff:10.29.195.101) 2.23ms
[I 2021-02-14 09:56:35.645 JupyterHub log:174] 200 GET /hub/login?next=%2Fhub%2Fuser%2Fsome_user%2Fapi%2Fkernelspecs%3F1611651394775 (@::ffff:10.29.195.101) 3.41ms
[I 2021-02-14 09:56:35.754 JupyterHub log:174] 302 GET /hub/user/some_user/api/kernels?1611651395276 -> /hub/login?next=%2Fhub%2Fuser%2Fsome_user%2Fapi%2Fkernels%3F1611651395276 (@::ffff:10.29.195.101) 2.36ms
[I 2021-02-14 09:56:36.019 JupyterHub log:174] 200 GET /hub/login?next=%2Fhub%2Fuser%2Fsome_user%2Fapi%2Fkernels%3F1611651395276 (@::ffff:10.29.195.101) 2.83ms
[I 2021-02-14 09:56:36.539 JupyterHub log:174] 302 GET /user/some_user/api/contents/transfer2/00_data/05_NG_ST_ConsForecast/test?content=1&1611651396275 -> /hub/user/some_user/api/contents/transfer2/00_data/05_NG_ST_ConsForecast/test?content=1&1611651396275 (@::ffff:10.29.195.101) 2.39ms
[I 2021-02-14 09:56:36.970 JupyterHub log:174] 302 GET /hub/user/some_user/api/contents/transfer2/00_data/05_NG_ST_ConsForecast/test?content=1&1611651396275 -> /hub/login?next=%2Fhub%2Fuser%2Fsome_user%2Fapi%2Fcontents%2Ftransfer2%2F00_data%2F05_NG_ST_ConsForecast%2Ftest%3Fcontent%3D1%261611651396275 (@::ffff:10.29.195.101) 2.41ms
[I 2021-02-14 09:56:37.258 JupyterHub log:174] 200 GET /hub/login?next=%2Fhub%2Fuser%2Fsome_user%2Fapi%2Fcontents%2Ftransfer2%2F00_data%2F05_NG_ST_ConsForecast%2Ftest%3Fcontent%3D1%261611651396275 (@::ffff:10.29.195.101) 2.79ms
[W 2021-02-14 09:56:37.553 JupyterHub log:174] 405 PUT /user/some_user/lab/api/workspaces/lab?1611651397270 (@::ffff:10.29.195.101) 8.38ms
[I 2021-02-14 09:56:38.898 JupyterHub log:174] 200 POST /hub/api/users/another_user/activity (another_user@162.11.0.5) 31.81ms
[I 2021-02-14 09:56:39.530 JupyterHub log:174] 302 GET /user/some_user/metrics?1611651399278 -> /hub/user/some_user/metrics?1611651399278 (@::ffff:10.29.195.101) 3.20ms
[I 2021-02-14 09:56:41.261 JupyterHub log:174] 302 GET /hub/user/some_user/metrics?1611651399278 -> /hub/login?next=%2Fhub%2Fuser%2Fsome_user%2Fmetrics%3F1611651399278 (@::ffff:10.29.195.101) 2.35ms
[I 2021-02-14 09:56:41.528 JupyterHub log:174] 200 GET /hub/login?next=%2Fhub%2Fuser%2Fsome_user%2Fmetrics%3F1611651399278 (@::ffff:10.29.195.101) 2.79ms
[W 2021-02-14 09:56:42.852 JupyterHub log:174] 405 PUT /user/some_user/lab/api/workspaces/lab?1611651401594 (@::ffff:10.29.195.101) 3.52ms
[I 2021-02-14 09:56:43.527 JupyterHub log:174] 302 GET /user/some_user/api/sessions?1611651403278 -> /hub/user/some_user/api/sessions?1611651403278 (@::ffff:10.29.195.101) 2.34ms
[I 2021-02-14 09:56:43.541 JupyterHub log:174] 302 GET /user/some_user/api/terminals?1611651403280 -> /hub/user/some_user/api/terminals?1611651403280 (@::ffff:10.29.195.101) 1.99ms
[I 2021-02-14 09:56:43.799 JupyterHub log:174] 302 GET /hub/user/some_user/api/sessions?1611651403278 -> /hub/login?next=%2Fhub%2Fuser%2Fsome_user%2Fapi%2Fsessions%3F1611651403278 (@::ffff:10.29.195.101) 2.28ms
[I 2021-02-14 09:56:43.809 JupyterHub log:174] 302 GET /hub/user/some_user/api/terminals?1611651403280 -> /hub/login?next=%2Fhub%2Fuser%2Fsome_user%2Fapi%2Fterminals%3F1611651403280 (@::ffff:10.29.195.101) 1.91ms
[I 2021-02-14 09:56:44.074 JupyterHub log:174] 200 GET /hub/login?next=%2Fhub%2Fuser%2Fsome_user%2Fapi%2Fsessions%3F1611651403278 (@::ffff:10.29.195.101) 2.81ms
[I 2021-02-14 09:56:44.085 JupyterHub log:174] 200 GET /hub/login?next=%2Fhub%2Fuser%2Fsome_user%2Fapi%2Fterminals%3F1611651403280 (@::ffff:10.29.195.101) 2.39ms
[I 2021-02-14 09:56:47.528 JupyterHub log:174] 302 GET /user/some_user/api/contents/transfer2/00_data/05_NG_ST_ConsForecast/test?content=1&1611651407289 -> /hub/user/some_user/api/contents/transfer2/00_data/05_NG_ST_ConsForecast/test?content=1&1611651407289 (@::ffff:10.29.195.101) 3.03ms
[I 2021-02-14 09:56:47.803 JupyterHub log:174] 302 GET /hub/user/some_user/api/contents/transfer2/00_data/05_NG_ST_ConsForecast/test?content=1&1611651407289 -> /hub/login?next=%2Fhub%2Fuser%2Fsome_user%2Fapi%2Fcontents%2Ftransfer2%2F00_data%2F05_NG_ST_ConsForecast%2Ftest%3Fcontent%3D1%261611651407289 (@::ffff:10.29.195.101) 2.34ms
[I 2021-02-14 09:56:48.094 JupyterHub log:174] 200 GET /hub/login?next=%2Fhub%2Fuser%2Fsome_user%2Fapi%2Fcontents%2Ftransfer2%2F00_data%2F05_NG_ST_ConsForecast%2Ftest%3Fcontent%3D1%261611651407289 (@::ffff:10.29.195.101) 2.85ms
[I 2021-02-14 09:56:49.070 JupyterHub log:174] 302 GET /user/some_user/metrics?1611651408844 -> /hub/user/some_user/metrics?1611651408844 (@::ffff:10.29.195.101) 2.44ms
[I 2021-02-14 09:56:49.485 JupyterHub log:174] 302 GET /user/some_user/api/kernels?1611651409233 -> /hub/user/some_user/api/kernels?1611651409233 (@::ffff:10.29.195.101) 2.36ms
[I 2021-02-14 09:56:49.754 JupyterHub log:174] 302 GET /hub/user/some_user/api/kernels?1611651409233 -> /hub/login?next=%2Fhub%2Fuser%2Fsome_user%2Fapi%2Fkernels%3F1611651409233 (@::ffff:10.29.195.101) 2.34ms
[I 2021-02-14 09:56:50.029 JupyterHub log:174] 200 GET /hub/login?next=%2Fhub%2Fuser%2Fsome_user%2Fapi%2Fkernels%3F1611651409233 (@::ffff:10.29.195.101) 2.81ms
[I 2021-02-14 09:56:50.104 JupyterHub log:174] 302 GET /hub/user/some_user/metrics?1611651408844 -> /hub/login?next=%2Fhub%2Fuser%2Fsome_user%2Fmetrics%3F1611651408844 (@::ffff:10.29.195.101) 2.55ms
[I 2021-02-14 09:56:50.445 JupyterHub log:174] 200 GET /hub/login?next=%2Fhub%2Fuser%2Fsome_user%2Fmetrics%3F1611651408844 (@::ffff:10.29.195.101) 2.79ms
[I 2021-02-14 09:57:17.752 JupyterHub log:174] 200 POST /hub/api/users/admin/activity (admin@172.20.0.4) 29.93ms
09:57:20.629 [ConfigProxy] [31merror[39m: 503 GET /user/some_user/api/kernels/c602709c-fb11-45e4-b14a-1b4ef921a197/channels?session_id=fad57b1e-31a8-4f6f-8eb9-f88ee83e47a0 Error: connect EHOSTUNREACH 172.18.0.4:8888
09:57:20.630 [ConfigProxy] [31merror[39m: 503 GET /user/some_user/lab? Error: connect EHOSTUNREACH 172.18.0.4:8888
09:57:20.648 [ConfigProxy] [31merror[39m: Uncaught Exception: write after end
[I 2021-02-14 09:57:20.648 JupyterHub log:174] 200 GET /hub/error/503?url=%2Fuser%2Fsome_user%2Fapi%2Fkernels%2Fc602709c-fb11-45e4-b14a-1b4ef921a197%2Fchannels%3Fsession_id%3Dfad57b1e-31a8-4f6f-8eb9-f88ee83e47a0 (@10.233.194.160) 1.86ms
[I 2021-02-14 09:57:20.650 JupyterHub log:174] 200 GET /hub/error/503?url=%2Fuser%2Fsome_user%2Flab%3F (@10.233.194.160) 1.07ms
09:57:20.654 [ConfigProxy] [31merror[39m: Error [ERR_STREAM_WRITE_AFTER_END]: write after end
    at writeAfterEnd (_stream_writable.js:248:12)
    at TLSSocket.Writable.write (_stream_writable.js:296:5)
    at IncomingMessage.upstream.on.data (/opt/rh/rh-nodejs10/root/usr/lib/node_modules/configurable-http-proxy/lib/configproxy.js:461:30)
    at IncomingMessage.emit (events.js:198:13)
    at IncomingMessage.Readable.read (_stream_readable.js:505:10)
    at flow (_stream_readable.js:974:34)
    at resume_ (_stream_readable.js:955:3)
    at process._tickCallback (internal/process/next_tick.js:63:19)

134 means “aborted” which means a low-level crash of the process. It could mean out-of-memory if folks are running code that’s resource-intensive. Since you have datascience-notebook, is it possible your users are running ML models or other things that might be exhausting memory?

Can you see the logs of the crashed container itself? The Hub logs are unlikely to be useful, because all they show is that the user’s server suddenly went away, not what happened in the container to cause it.