Deployed binder on EKS stopped working

Hi all!

We had our deployment running on EKS for a few months without issues. For the issues we did have deleting and triggering the pods would cut it and solve the issue. Been using this binderhub image: jupyterhub/k8s-binderhub:0.2.0-n956.h64a6ec4 with EKS 1.21

Now a few weeks ago, we’ve started seeing pods triggered but the actual jupyter never loads/there’s a 404 (I suspect those might be different issues). Nothing changed in the cluster or the repo links when it happened.

When I tried viewing the pod that was launched I saw these logs:

/srv/conda/envs/notebook/lib/python3.10/subprocess.py:961: RuntimeWarning: line buffering (buffering=1) isn't supported in binary mode, the default buffer size will be used β”‚
β”‚   self.stdout = io.open(c2pread, 'rb', bufsize)                                                                                                                              β”‚
β”‚ [I 16:38:42.715 NotebookApp] Writing notebook server cookie secret to /home/jovyan/.local/share/jupyter/runtime/notebook_cookie_secret                                       β”‚
β”‚ [W 2023-08-01 16:38:43.476 LabApp] 'ip' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next rele β”‚
β”‚ [W 2023-08-01 16:38:43.476 LabApp] 'port' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next re β”‚
β”‚ [W 2023-08-01 16:38:43.476 LabApp] 'base_url' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our nex β”‚
β”‚ [W 2023-08-01 16:38:43.476 LabApp] 'token' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next r β”‚
β”‚ [W 2023-08-01 16:38:43.476 LabApp] 'trust_xheaders' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before o β”‚
β”‚ [W 2023-08-01 16:38:43.476 LabApp] 'allow_origin' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our β”‚
β”‚ [W 2023-08-01 16:38:43.477 LabApp] 'allow_origin_pat' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before β”‚
β”‚ [W 2023-08-01 16:38:43.477 LabApp] 'allow_origin_pat' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before β”‚
β”‚ [I 2023-08-01 16:38:43.480 LabApp] JupyterLab extension loaded from /srv/conda/envs/notebook/lib/python3.10/site-packages/jupyterlab                                         β”‚
β”‚ [I 2023-08-01 16:38:43.480 LabApp] JupyterLab application directory is /srv/conda/envs/notebook/share/jupyter/lab                                                            β”‚
β”‚ [I 16:38:43.551 NotebookApp] [Jupytext Server Extension] Deriving a JupytextContentsManager from LargeFileManager

nteract extension loaded from /srv/conda/envs/notebook/lib/python3.10/site-packages/nteract_on_jupyter                                          β”‚
 [I 16:38:44.911 NotebookApp] [Ploomber] setting content manager to PloomberContentsManager                                                                                    [I 16:38:44.912 NotebookApp] Serving notebooks from local directory: /home/jovyan                                                                                            
 [I 16:38:44.912 NotebookApp] Jupyter Notebook 6.3.0 is running at:                                                                                                           
 [I 16:38:44.913 NotebookApp] http://jupyter-ploomber-2dbinder-2denv-2dc48ir7bi:8888/user/ploomber-binder-env-c48ir7bi/?token=...                                             
 [I 16:38:44.913 NotebookApp]  or http://127.0.0.1:8888/user/ploomber-binder-env-c48ir7bi/?token=...                                                                          
 [I 16:38:44.913 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).                                                       
 [W 16:38:44.915 NotebookApp] No web browser found: could not locate runnable browser.                                                                                        
 [I 16:38:44.917 NotebookApp] 302 GET /user/ploomber-binder-env-c48ir7bi/ (10.0.206.191) 1.040000ms                                                                           
 [E 16:38:45.056 NotebookApp] Could not open static file ''                                                                                                                   
 [I 16:38:45.220 NotebookApp] 301 GET /user/ploomber-binder-env-c48ir7bi/static/logo/logo.png (24.228.233.64) 0.340000ms                                                      
 [I 16:38:46.086 NotebookApp] $ git fetch                                                                                                                                     
 [I 16:38:46.087 NotebookApp] $ git reset --mixed                                                                                                                             
 [I 16:38:46.087 NotebookApp] $ git -c user.email=nbgitpuller@nbgitpuller.link -c user.name=nbgitpuller merge -Xours origin/master                                            
 [I 16:38:46.087 NotebookApp] Already up to date.                                                                                                                             
 [I 16:38:46.429 NotebookApp] 301 GET /user/ploomber-binder-env-c48ir7bi/static/favicons/favicon.ico (24.228.233.64) 1.010000ms                                              
/srv/conda/envs/notebook/lib/python3.10/json/encoder.py:257: UserWarning: date_default is deprecated since jupyter_client 7.0.0. Use jupyter_client.jsonutil.json_default.  
   return _iterencode(o, 0)                                                                                                                                                   
[I 2023-08-01 16:38:49.426 LabApp] Build is up to date
Stream closed EOF for binder/jupyter-ploomber-2dbinder-2denv-2dc48ir7bi (notebook)

I assumed these backward compatibility warnings shouldn’t break our flow, am I correct? Or does it relate to the new Jupyter client (v7)? I’m a bit lost on where to even get started debugging this.

I did try to look for the answer in the hub itself and that’s the logs I got:

This is the 404:

 [I 2023-08-01 17:50:36.261 JupyterHub log:189] 201 POST /hub/api/users/ploomber-jupysql-u6krwjjd (binder@52.71.181.87) 24.89ms                                               β”‚
β”‚ [I 2023-08-01 17:50:36.324 JupyterHub provider:574] Creating oauth client jupyterhub-user-ploomber-jupysql-u6krwjjd                                                          β”‚
β”‚ [W 2023-08-01 17:50:36.338 JupyterHub spawner:2861] Ignoring unrecognized KubeSpawner user_options: binder_launch_host, binder_persistent_request, binder_ref_url, binder_re β”‚
β”‚ [I 2023-08-01 17:50:36.341 JupyterHub log:189] 202 POST /hub/api/users/ploomber-jupysql-u6krwjjd/servers/ (binder@52.71.181.87) 75.07ms                                      β”‚
β”‚ [I 2023-08-01 17:50:36.341 JupyterHub spawner:2302] Attempting to create pod jupyter-ploomber-2djupysql-2du6krwjjd, with timeout 3                                           β”‚
β”‚ [W 2023-08-01 17:50:41.702 JupyterHub _version:41] Single-user server has no version header, which means it is likely < 0.8. Expected 1.5.0                                  β”‚
β”‚ [I 2023-08-01 17:50:41.702 JupyterHub base:909] User ploomber-jupysql-u6krwjjd took 5.428 seconds to start                                                                   β”‚
β”‚ [I 2023-08-01 17:50:41.702 JupyterHub proxy:285] Adding user ploomber-jupysql-u6krwjjd to proxy /user/ploomber-jupysql-u6krwjjd/ => http://10.0.246.204:8888                 β”‚
β”‚ [I 2023-08-01 17:50:41.704 JupyterHub users:677] Server ploomber-jupysql-u6krwjjd is ready                                                                                   β”‚
β”‚ [I 2023-08-01 17:50:41.704 JupyterHub log:189] 200 GET /hub/api/users/ploomber-jupysql-u6krwjjd/server/progress (binder@52.71.181.87) 5360.43ms                              β”‚
β”‚ [I 2023-08-01 17:51:05.573 JupyterHub log:189] 302 GET /.env -> /hub/.env (@::ffff:10.0.227.106) 0.63ms                                                                      β”‚
β”‚ [W 2023-08-01 17:51:06.119 JupyterHub log:189] 405 POST / (@::ffff:10.0.227.106) 1.28ms     

And those are the logs for the stale jupyter:

 [I 2023-08-01 17:57:03.315 JupyterHub log:189] 201 POST /hub/api/users/ploomber-binder-env-0rltlp2g (binder@52.71.181.87) 25.03ms                                            β”‚
β”‚ [I 2023-08-01 17:57:03.344 JupyterHub provider:574] Creating oauth client jupyterhub-user-ploomber-binder-env-0rltlp2g                                                       β”‚
β”‚ [W 2023-08-01 17:57:03.358 JupyterHub spawner:2861] Ignoring unrecognized KubeSpawner user_options: binder_launch_host, binder_persistent_request, binder_ref_url, binder_re β”‚
β”‚ [I 2023-08-01 17:57:03.361 JupyterHub log:189] 202 POST /hub/api/users/ploomber-binder-env-0rltlp2g/servers/ (binder@52.71.181.87) 40.94ms                                   β”‚
β”‚ [I 2023-08-01 17:57:03.361 JupyterHub spawner:2302] Attempting to create pod jupyter-ploomber-2dbinder-2denv-2d0rltlp2g, with timeout 3                                      β”‚
β”‚ [W 2023-08-01 17:57:10.128 JupyterHub _version:41] Single-user server has no version header, which means it is likely < 0.8. Expected 1.5.0                                  β”‚
β”‚ [I 2023-08-01 17:57:10.128 JupyterHub base:909] User ploomber-binder-env-0rltlp2g took 6.800 seconds to start                                                                β”‚
β”‚ [I 2023-08-01 17:57:10.128 JupyterHub proxy:285] Adding user ploomber-binder-env-0rltlp2g to proxy /user/ploomber-binder-env-0rltlp2g/ => http://10.0.248.0:8888             β”‚
β”‚ [I 2023-08-01 17:57:10.130 JupyterHub users:677] Server ploomber-binder-env-0rltlp2g is ready                                                                                β”‚
β”‚ [I 2023-08-01 17:57:10.131 JupyterHub log:189] 200 GET /hub/api/users/ploomber-binder-env-0rltlp2g/server/progress (binder@52.71.181.87) 6766.40ms                           β”‚
β”‚ [I 2023-08-01 17:57:22.325 JupyterHub proxy:347] Checking routes                                                                                                             β”‚
β”‚ [I 2023-08-01 17:57:22.411 JupyterHub log:189] 200 GET /hub/api/users (cull-idle@127.0.0.1) 8.85ms    

What’s changed in the time the failures started to appear? Were you using exactly the same BinderHub version and configuration, or did you bump the version of repo2docker?

works, which implies there’s no obvious bugs.

It could be worth removing the pip dependencies in your environment.yml and seeing if that works. If it doesn’t then probably try pinning the other dependencies (or removing more of them), if it does then iteratively add back the pip requirements.

Nothing changed with the infras/pods.
I’ll try playing with the dependencies and debug from there.

Will keep you posted.
Thanks!!

1 Like

You were right, Jupytext updated their package and this broke the whole env.
Is there a way we can utilize binder’s API to test this as part of our CI?