Large notebooks save fails with failed to fetch

Im running zero to jhub on eks.
when saving large notebooks (~50 mb) the corresponding /api/contents/Huge.ipynb?content=0&1619793 HTTP put request fails with connection reset (after something like 1 minute).
running on a better network the request takes 40sec and file is saved properly
When saving on The Littlest JupyterHub (on the bad network) request succeed after 2 minutes while on zero to jhub i get connection reset after 1 min.

This led me to think this is some sort of timeout issue, so i have increased timeouts for every component i could think about (the aws load balancer, the configurable http proxy,)

I saw there was an issue regarding getting connection reset on kubernetes as a result of INVALID packets that are not dropped
(see: kube-proxy Subtleties: Debugging an Intermittent Connection Reset | Kubernetes). i tried to configure conntrack to be liberal but that didn’t work either.

any suggestions?

Thnx a lot!

It might be worth setting up a simple K8S service for testing. For example, you could run Jupyter notebook in your K8S cluster as a standalone service without JupyterHub, though with the AWS loadbalancer, and see if you have timeout problems with saving. If you don’t then that suggests the timeout may be occurring in Z2JH. If you do, then it’d be worth setting up an even simpler service (e.g. a basic server that allows uploads) and testing that. If that also doesn’t work then you know it’s a timeout somewhere in the K8s/EC2 infrastructure.

thanx! ill check that out

I ran a standalone simple jupyter notebook on the same cluster with same kind of load balancer and notebook is saved just fine. it seems like something with Z2JH…

afterwards i added a configurable http proxy before the standalone notebook and the saving error reappeared… seems like something with the chp…
any suggestions?

There are two potentially useful options

  --timeout <n>                      Timeout (in millis) when proxy drops connection for a request.
  --proxy-timeout <n>                Timeout (in millis) when proxy receives no response from target.

Could you try both those? If not then maybe configurable-http-proxy needs another option governing the total timeout for the request to complete- if so please open a feature request on GitHub - jupyterhub/configurable-http-proxy: node-http-proxy plus a REST API and cross-reference this topic. Thanks!

I tried configure these timeout params and it didn’t help…
dose z2jh have an option to run with another proxy? this will help me test if the problem really is in the chp

Not at the moment. There’s an old PR to add Traefik [WIP] use traefik for the proxy by minrk · Pull Request #1162 · jupyterhub/zero-to-jupyterhub-k8s · GitHub

If you add --log-level debug to CHP in front of your standalone notebook do you see anything interesting in the CHP logs?

Tried that also and couldn’t find anything interesting…