Jupyterhub deployment on k8s 1.27

Hello!

I am trying to deploy Jupyterhub 2.0.0 helm chart on k8s version 1.27.2 inside minikube VM. When trying to deploy the helm chart locally, I get the following error on hub pod (after this pod keeps on restarting with the same error):

Loading /usr/local/etc/jupyterhub/secret/values.yaml
No config at /usr/local/etc/jupyterhub/existing-secret/values.yaml
Loading extra config: hub_config
[I 2023-05-31 03:35:16.707 JupyterHub app:2775] Running JupyterHub version 3.0.0
[I 2023-05-31 03:35:16.707 JupyterHub app:2805] Using Proxy: jupyterhub.proxy.ConfigurableHTTPProxy-3.0.0
[I 2023-05-31 03:35:16.796 alembic.runtime.migration migration:204] Context impl SQLiteImpl.
[I 2023-05-31 03:35:16.796 alembic.runtime.migration migration:207] Will assume non-transactional DDL.
[I 2023-05-31 03:35:16.810 alembic.runtime.migration migration:618] Running stamp_revision  -> 651f5419b74d
[I 2023-05-31 03:35:16.823 alembic.runtime.migration migration:204] Context impl SQLiteImpl.
[I 2023-05-31 03:35:16.823 alembic.runtime.migration migration:207] Will assume non-transactional DDL.
[I 2023-05-31 03:35:17.228 JupyterHub roles:173] Role jupyterhub-idle-culler added to database
[I 2023-05-31 03:35:17.243 JupyterHub roles:238] Adding role admin for User: zeus
[I 2023-05-31 03:35:17.252 JupyterHub roles:238] Adding role user for User: zeus
[I 2023-05-31 03:35:17.259 JupyterHub app:1934] Not using allowed_users. Any authenticated user will be allowed.
[I 2023-05-31 03:35:17.496 JupyterHub app:2844] Initialized 0 spawners in 0.075 seconds
[I 2023-05-31 03:35:17.502 JupyterHub app:3057] Not starting proxy
[W 2023-05-31 03:35:37.518 JupyterHub proxy:903] api_request to the proxy failed with status code 599, retrying...
[W 2023-05-31 03:35:57.655 JupyterHub proxy:903] api_request to the proxy failed with status code 599, retrying...
[E 2023-05-31 03:35:57.656 JupyterHub app:3297]
    Traceback (most recent call last):
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/app.py", line 3295, in launch_instance_async
        await self.start()
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/app.py", line 3061, in start
        await self.proxy.get_all_routes()
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/proxy.py", line 950, in get_all_routes
        resp = await self.api_request('', client=client)
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/proxy.py", line 914, in api_request
        result = await exponential_backoff(
      File "/usr/local/lib/python3.9/site-packages/jupyterhub/utils.py", line 236, in exponential_backoff
        raise asyncio.TimeoutError(fail_message)
    asyncio.exceptions.TimeoutError: Repeated api_request to proxy path "" failed.

Is jupyterhub tried/tested on k8s 1.27.2? If yes, can someone help me with the above error? Any inputs will be appreciated!

I saw this issue: We're using beta APIs removed in k8s 1.26 · Issue #2587 · jupyterhub/mybinder.org-deploy · GitHub. Not sure if it has been tested to deploy jupyterhub on k8s 1.27.x

Can you try the latest development version of Z2JH?

1 Like

Thanks again for quick response @manics! My local setup of kubernetes was messed up. After properly setting up local k8s with 1.27.2 version, jupyterhub pods comes up fine. However, when trying to spawn singleuser container, userscheduler pod fails with:

E0531 18:27:29.478156       1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: the server could not find the requested resource

Which I believe is related to: We're using beta APIs removed in k8s 1.26 · Issue #2587 · jupyterhub/mybinder.org-deploy · GitHub.

It would be great if you can confirm if that is the issue? Using the development version: 3.0.0-0.dev.git.6175.hf9af31a3 works on 1.27.2 k8s version.

Having said that, is it possible to make 2.0.0 z2jh helm chart work on 1.27.X k8s version, by disabling any configs/rbacs?

The latest dev version should support 1.27, if it doesn’t please open a bug report with as much information as possible.

There are ~80 changed files between 2.0.0 and the main branch, but most of those are docs or related to CI. If you compare the diff for just the jupyterhub/ subdirectory you’ll hopefully figure out the relevant changes.

@manics would you have a rough idea, when the new (3.0.0) helm chart version will be released? It would be great if you can give an estimated timeline

So I tried looking into the commits. Looks like upgrading to a newer kube-scheduler image (1.26) should fix the problem. PR: Update kube-scheduler version from v1.26.4 to v1.26.5 by jupyterhub-bot · Pull Request #3117 · jupyterhub/zero-to-jupyterhub-k8s · GitHub

However, I tried overriding the userscheduler image tag to v.1.24.0 in my helm chart, but bringing up jupyterhub fails and userscheduler pod has the following error:

I0602 00:01:37.501594       1 flags.go:64] FLAG: --add-dir-header="false"
I0602 00:01:37.501616       1 flags.go:64] FLAG: --allow-metric-labels="[]"
I0602 00:01:37.501619       1 flags.go:64] FLAG: --alsologtostderr="false"
I0602 00:01:37.501621       1 flags.go:64] FLAG: --authentication-kubeconfig=""
I0602 00:01:37.501622       1 flags.go:64] FLAG: --authentication-skip-lookup="true"
I0602 00:01:37.501624       1 flags.go:64] FLAG: --authentication-token-webhook-cache-ttl="10s"
I0602 00:01:37.501628       1 flags.go:64] FLAG: --authentication-tolerate-lookup-failure="true"
I0602 00:01:37.501629       1 flags.go:64] FLAG: --authorization-always-allow-paths="[/healthz,/readyz,/livez]"
I0602 00:01:37.501632       1 flags.go:64] FLAG: --authorization-kubeconfig=""
I0602 00:01:37.501634       1 flags.go:64] FLAG: --authorization-webhook-cache-authorized-ttl="10s"
I0602 00:01:37.501635       1 flags.go:64] FLAG: --authorization-webhook-cache-unauthorized-ttl="10s"
I0602 00:01:37.501636       1 flags.go:64] FLAG: --bind-address="0.0.0.0"
I0602 00:01:37.501639       1 flags.go:64] FLAG: --cert-dir=""
I0602 00:01:37.501640       1 flags.go:64] FLAG: --client-ca-file=""
I0602 00:01:37.501641       1 flags.go:64] FLAG: --config="/etc/user-scheduler/config.yaml"
I0602 00:01:37.501643       1 flags.go:64] FLAG: --contention-profiling="true"
I0602 00:01:37.501644       1 flags.go:64] FLAG: --disabled-metrics="[]"
I0602 00:01:37.501646       1 flags.go:64] FLAG: --feature-gates=""
I0602 00:01:37.501650       1 flags.go:64] FLAG: --help="false"
I0602 00:01:37.501651       1 flags.go:64] FLAG: --http2-max-streams-per-connection="0"
I0602 00:01:37.501653       1 flags.go:64] FLAG: --kube-api-burst="100"
I0602 00:01:37.501655       1 flags.go:64] FLAG: --kube-api-content-type="application/vnd.kubernetes.protobuf"
I0602 00:01:37.501656       1 flags.go:64] FLAG: --kube-api-qps="50"
I0602 00:01:37.501659       1 flags.go:64] FLAG: --kubeconfig=""
I0602 00:01:37.501660       1 flags.go:64] FLAG: --leader-elect="true"
I0602 00:01:37.501661       1 flags.go:64] FLAG: --leader-elect-lease-duration="15s"
I0602 00:01:37.501663       1 flags.go:64] FLAG: --leader-elect-renew-deadline="10s"
I0602 00:01:37.501664       1 flags.go:64] FLAG: --leader-elect-resource-lock="leases"
I0602 00:01:37.501665       1 flags.go:64] FLAG: --leader-elect-resource-name="kube-scheduler"
I0602 00:01:37.501667       1 flags.go:64] FLAG: --leader-elect-resource-namespace="kube-system"
I0602 00:01:37.501669       1 flags.go:64] FLAG: --leader-elect-retry-period="2s"
I0602 00:01:37.501672       1 flags.go:64] FLAG: --lock-object-name="kube-scheduler"
I0602 00:01:37.501674       1 flags.go:64] FLAG: --lock-object-namespace="kube-system"
I0602 00:01:37.501675       1 flags.go:64] FLAG: --log-backtrace-at=":0"
I0602 00:01:37.501678       1 flags.go:64] FLAG: --log-dir=""
I0602 00:01:37.501680       1 flags.go:64] FLAG: --log-file=""
I0602 00:01:37.501681       1 flags.go:64] FLAG: --log-file-max-size="1800"
I0602 00:01:37.501683       1 flags.go:64] FLAG: --log-flush-frequency="5s"
I0602 00:01:37.501684       1 flags.go:64] FLAG: --log-json-info-buffer-size="0"
I0602 00:01:37.501688       1 flags.go:64] FLAG: --log-json-split-stream="false"
I0602 00:01:37.501689       1 flags.go:64] FLAG: --logging-format="text"
I0602 00:01:37.501690       1 flags.go:64] FLAG: --logtostderr="true"
I0602 00:01:37.501692       1 flags.go:64] FLAG: --master=""
I0602 00:01:37.501693       1 flags.go:64] FLAG: --one-output="false"
I0602 00:01:37.501695       1 flags.go:64] FLAG: --permit-address-sharing="false"
I0602 00:01:37.501696       1 flags.go:64] FLAG: --permit-port-sharing="false"
I0602 00:01:37.501697       1 flags.go:64] FLAG: --pod-max-in-unschedulable-pods-duration="5m0s"
I0602 00:01:37.501698       1 flags.go:64] FLAG: --profiling="true"
I0602 00:01:37.501700       1 flags.go:64] FLAG: --requestheader-allowed-names="[]"
I0602 00:01:37.501702       1 flags.go:64] FLAG: --requestheader-client-ca-file=""
I0602 00:01:37.501703       1 flags.go:64] FLAG: --requestheader-extra-headers-prefix="[x-remote-extra-]"
I0602 00:01:37.501705       1 flags.go:64] FLAG: --requestheader-group-headers="[x-remote-group]"
I0602 00:01:37.501707       1 flags.go:64] FLAG: --requestheader-username-headers="[x-remote-user]"
I0602 00:01:37.501709       1 flags.go:64] FLAG: --secure-port="10259"
I0602 00:01:37.501710       1 flags.go:64] FLAG: --show-hidden-metrics-for-version=""
I0602 00:01:37.501711       1 flags.go:64] FLAG: --skip-headers="false"
I0602 00:01:37.501713       1 flags.go:64] FLAG: --skip-log-headers="false"
I0602 00:01:37.501714       1 flags.go:64] FLAG: --stderrthreshold="2"
I0602 00:01:37.501715       1 flags.go:64] FLAG: --tls-cert-file=""
I0602 00:01:37.501716       1 flags.go:64] FLAG: --tls-cipher-suites="[]"
I0602 00:01:37.501718       1 flags.go:64] FLAG: --tls-min-version=""
I0602 00:01:37.501719       1 flags.go:64] FLAG: --tls-private-key-file=""
I0602 00:01:37.501721       1 flags.go:64] FLAG: --tls-sni-cert-key="[]"
I0602 00:01:37.501724       1 flags.go:64] FLAG: --v="4"
I0602 00:01:37.501728       1 flags.go:64] FLAG: --version="false"
I0602 00:01:37.501732       1 flags.go:64] FLAG: --vmodule=""
I0602 00:01:37.501734       1 flags.go:64] FLAG: --write-config-to=""
I0602 00:01:38.194796       1 serving.go:348] Generated self-signed cert in-memory
W0602 00:01:39.681428       1 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
E0602 00:01:39.692263       1 run.go:74] "command failed" err="couldn't create resource lock: endpoints lock is removed, migrate to endpointsleases"

I believe this is due to incompatibility as the default version in 2.0.0 helm chart is 1.23.10. Can you confirm?

You’ve done more investigation than I have into this issue, and what you say sounds plausible!

There’s no firm release date for 3.0.0, but if you’ve tested a recent dev release and confirmed everything is working I’d pin that version and run with that. You can also try and modify 2.0.0 to work with k8s 1.27, but since that’s not something we’ve tested you’ll have to test it all yourself too.

As with all production systems ensure you’ve got ongoing backups of any user data regardless of which option you choose.

@manics

Appreciate your response here! If you could provide a rough estimate (I know it is difficult to give exact timelines) for the next major release, that would be really helpful! :sweat_smile:

You can track release progress in Planning release for 3.0.0 · Issue #3091 · jupyterhub/zero-to-jupyterhub-k8s · GitHub. We can make a pre-release, but there is a breaking change in oauthenticator I’d like to bundle with it as well before the final release.

I’d really like to get it released before mid July.

1 Like

(Hopefully) final question:

@manics After successful authentication of user on hub, I’m getting the following error:

Here are the logs from hub pod:

Loading /usr/local/etc/jupyterhub/secret/values.yaml
No config at /usr/local/etc/jupyterhub/existing-secret/values.yaml
Loading extra config: hub_config
[I 2023-06-08 22:47:26.146 JupyterHub app:2854] Running JupyterHub version 4.0.0
[I 2023-06-08 22:47:26.147 JupyterHub app:2884] Using Proxy: jupyterhub.proxy.ConfigurableHTTPProxy-4.0.0
[I 2023-06-08 22:47:26.253 alembic.runtime.migration migration:207] Context impl SQLiteImpl.
[I 2023-06-08 22:47:26.253 alembic.runtime.migration migration:210] Will assume non-transactional DDL.
[I 2023-06-08 22:47:26.265 alembic.runtime.migration migration:617] Running stamp_revision  -> 0eee8c825d24
[I 2023-06-08 22:47:26.334 alembic.runtime.migration migration:207] Context impl SQLiteImpl.
[I 2023-06-08 22:47:26.334 alembic.runtime.migration migration:210] Will assume non-transactional DDL.
[I 2023-06-08 22:47:26.667 JupyterHub roles:172] Role jupyterhub-idle-culler added to database
[I 2023-06-08 22:47:26.677 JupyterHub roles:238] Adding role admin for User: zeus
[I 2023-06-08 22:47:26.685 JupyterHub roles:238] Adding role user for User: zeus
[I 2023-06-08 22:47:26.693 JupyterHub app:1983] Not using allowed_users. Any authenticated user will be allowed.
[I 2023-06-08 22:47:26.856 JupyterHub app:2923] Initialized 0 spawners in 0.001 seconds
[I 2023-06-08 22:47:26.858 JupyterHub metrics:278] Found 0 active users in the last ActiveUserPeriods.twenty_four_hours
[I 2023-06-08 22:47:26.859 JupyterHub metrics:278] Found 0 active users in the last ActiveUserPeriods.seven_days
[I 2023-06-08 22:47:26.859 JupyterHub metrics:278] Found 0 active users in the last ActiveUserPeriods.thirty_days
[I 2023-06-08 22:47:26.859 JupyterHub app:3137] Not starting proxy
[I 2023-06-08 22:47:26.862 JupyterHub app:3173] Hub API listening on http://:8081/hub/
[I 2023-06-08 22:47:26.862 JupyterHub app:3175] Private Hub API connect url http://hub:8081/hub/
[I 2023-06-08 22:47:26.862 JupyterHub app:3184] Starting managed service jupyterhub-idle-culler
[I 2023-06-08 22:47:26.863 JupyterHub service:385] Starting service 'jupyterhub-idle-culler': ['python3', '-m', 'jupyterhub_idle_culler', '--url=http://localhost:8081/hub/api', '--timeout=900', '--cull-every=300', '--concurrency=10']
[I 2023-06-08 22:47:26.863 JupyterHub service:133] Spawning python3 -m jupyterhub_idle_culler --url=http://localhost:8081/hub/api --timeout=900 --cull-every=300 --concurrency=10
[I 2023-06-08 22:47:26.865 JupyterHub proxy:477] Adding route for Hub: / => http://hub:8081
[I 2023-06-08 22:47:26.931 JupyterHub app:3242] JupyterHub is now running, internal Hub API at http://hub:8081/hub/
[I 2023-06-08 22:47:27.755 JupyterHub log:191] 200 GET /hub/api/ (jupyterhub-idle-culler@127.0.0.1) 8.72ms
[I 2023-06-08 22:47:27.834 JupyterHub log:191] 200 GET /hub/api/users?state=[secret] (jupyterhub-idle-culler@127.0.0.1) 78.03ms
[I 2023-06-08 22:47:31.688 JupyterHub log:191] 302 GET / -> /hub/ (@10.244.0.1) 0.63ms
[I 2023-06-08 22:47:31.708 JupyterHub log:191] 302 GET /hub/ -> /hub/login?next=%2Fhub%2F (@10.244.0.1) 0.46ms
[I 2023-06-08 22:47:31.748 JupyterHub log:191] 200 GET /hub/login?next=%2Fhub%2F (@10.244.0.1) 15.93ms
[W 2023-06-08 22:47:33.379 JupyterHub web:1869] 403 POST /hub/login?next=%2Fhub%2F (10.244.0.1): '_xsrf' argument missing from POST
[W 2023-06-08 22:47:33.391 JupyterHub log:191] 403 POST /hub/login?next=%2Fhub%2F (@10.244.0.1) 11.87ms

The browser request itself is also having the _xsrf cookie in the request

Any ideas why this is happening?

Which version of the Z2JH chart are you using? Can you show us your full configuration, including any customisations you’ve made to the chart? How was your K8s cluster setup, and do you have any proxies/firewalls?

In jupyterhub 4, the _xsrf cookie set must have been set in a browser during a visit to a jupyterhub website, which then should pass back if its making a post request.

What software makes the post request? That may need to be made compatible with jupyterhub 4.

@consideRatio that’s what I’m surprised, the browser request does have the cookie but jupyterhub is not reading it somehow. The cookie was added to the browser when accessing the jupyterhub url

I think the software making post request may need to include it in the body, not as a passed cookie

what do you mean by software? If browser, then I’m using firefox. I tried an incognito window too, as well as chrome. It failed in each of them.

What authenticator is being used?

There should be log statements on startup like:

[I 2023-06-13 14:04:14.205 JupyterHub app:2889] Using Authenticator: jupyterhub.auth.PAMAuthenticator-4.0.0
[I 2023-06-13 14:04:14.205 JupyterHub app:2889] Using Spawner: jupyterhub.spawner.LocalProcessSpawner-4.0.0

but I only see it for the Proxy.

When submitting forms, the xsrf token must be sent twice (that’s how xsrf tokens work), both in the cookie and in the form. This is how it is used to verify that the form was sent from a page with access to the cookie.

If the Authenticator is using a custom login form (or the default login form is customized by other means), it must include the xsrf token as a hidden input, as seen here.

2 Likes

@minrk Thanks for pointing out! We do have a custom login form and reverting to the default form fixed the issue! Really appreciate everybody’s help!

We are using a custom authenticator which I’m not sure how much could be publicly disclosed hence I removed that part from logs.