403 Forbidden XSRF cookie does not match POST argument after updating to the latest helm chart version

I’ve recently inherited our Jupyter infrastructure, which runs on EKS and is managed using Terraform. My background isn’t in Kubernetes or Jupyter, so I’m doing my best to get up to speed. During the last task I worked on with the previous point of contact, we upgraded our Dev JupyterHub environment from version 3.0.2 to 3.3.8 (the latest Helm chart), and since then, we’ve been encountering intermittent login issues. At this point, I’m unsure of the next steps to resolve the problem and would appreciate any guidance on how to move forward.

The error we’re receiving is: 403 Forbidden: XSRF cookie does not match POST argument.

What Happens:

  • This issue occurs sporadically.
  • The error typically surfaces after logging out and attempting to log back in.
  • Clearing the browser’s cookies, cache, and history multiple times resolves the issue.
  • Reproducing the problem consistently is difficult, but the following steps can sometimes lead to it:
    1. Log in and work in JupyterHub as usual.
    2. Log out.
    3. Keep the browser tab and Chrome window open, and after a random period of time, attempt to log back in.
    4. At this point, the XSRF issue/403 error appears.
    5. In my case, I closed the browser completely, cleared cache/history, and retried logging in multiple times before succeeding.

Expected Behavior:

  • Users should be able to log in every time without encountering the 403 Forbidden error.

Actual Behavior:

  • After clicking the “Sign in” button, the login attempt fails, and the error message is returned immediately.

Setup Details:

  • JupyterHub Version: Upgraded to 3.3.8 (from 3.0.2).
  • Kubernetes Environment: Running on AWS EKS version 1.29.
  • Infrastructure as Code: Managed using Terraform.
  • Namespace Pods: The JupyterHub namespace includes the following pods running at all times:
    • 1/1 hub
    • 1/1 proxy
    • 1/1 user-scheduler-ABC
    • 1/1 user-scheduler-XYZ
  • Load Balancer: We are using a Kubernetes service (proxy-public) deployed as a Classic Load Balancer. TCP listeners (so no option for cookie stickiness), cross-zone load balancing enabled, Desync mitigation mode is defensive.

Additional thoughts:

I’ve seen that others have resolved similar issues by enabling sticky sessions on the load balancer. However, since it looks like we’re only running a single proxy pod, wouldn’t it be unlikely that the classic load balancer is causing the XSRF token issue? Sticky sessions and balancing between multiple pods shouldn’t be relevant in this case, correct?

Please let me know what additional information I can provide that’ll may be helpful in identifying the issue.

Logs (Hub):

[D 2024-10-07 16:29:07.140 JupyterHub reflector:374] pods watcher timeout

[D 2024-10-07 16:29:07.140 JupyterHub reflector:289] Connecting pods watcher

[D 2024-10-07 16:29:07.150 JupyterHub reflector:374] events watcher timeout

[D 2024-10-07 16:29:07.150 JupyterHub reflector:289] Connecting events watcher

[D 2024-10-07 16:29:07.626 JupyterHub log:192] 200 GET /hub/health (@10.X.X.X) 1.29ms

[D 2024-10-07 16:29:08.706 JupyterHub _xsrf_utils:155] xsrf id mismatch b’None:lk2A3l79ZpmOG9dBDC645F=’ != b’None:KP6dA8xgrE5LIn2y4Mtjah2NEt3UQVt1wvzJB6m3Y48=’

[I 2024-10-07 16:29:08.706 JupyterHub _xsrf_utils:125] Setting new xsrf cookie for b’None:KP6dA8xgrE5LIn2y4Mtja=’ {‘path’: ‘/hub/’, ‘max_age’: 3600}

[W 2024-10-07 16:29:08.706 JupyterHub web:1873] 403 POST /hub/login?next= (::ffff:10.X.X.X): XSRF cookie does not match POST argument

[D 2024-10-07 16:29:08.707 JupyterHub base:1471] No template for 403

[W 2024-10-07 16:29:08.709 JupyterHub log:192] 403 POST /hub/login?next= (@::ffff:10.X.X.X) 4.36ms

Logs (Proxy):

16:29:08.697 [ConfigProxy] debug: PROXY WEB /hub/login to http://hub:8081

16:29:08.711 [ConfigProxy] debug: Not recording activity for status 403 on /

  values = [jsonencode({
    debug = {
      enabled = true
    }
    proxy = {
      https = {
        enabled = true
        type    = "secret"
        secret = {
          name = local.cert_secret
        }
      }
      service = {
        annotations = {
          "service.beta.kubernetes.io/aws-load-balancer-internal"                = "true"
          "service.beta.kubernetes.io/aws-load-balancer-backend-protocol"        = "tcp"
          "service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout" = "3600"
          "service.beta.kubernetes.io/aws-load-balancer-access-log-enabled"                = "true"
          "service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-name"         = var.lb_access_logs_bucket
          "service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-prefix"       = var.lb_access_logs_bucket_prefix
          "service.beta.kubernetes.io/aws-load-balancer-connection-draining-enabled"       = "true"
          "service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled" = "true"
          "service.beta.kubernetes.io/aws-load-balancer-additional-resource-tags"          = local.lb_tags_string
        }
      }
    }
      extraConfig = {
        "myconfig.py" = <<-EOF
          c.JupyterHub.statsd_host = "${var.statsd_host}"
          c.JupyterHub.statsd_port = ${var.statsd_port}
          c.JupyterHub.statsd_prefix = "${local.statsd_prefix}"

          c.JupyterHub.authenticator_class = 'ldapauthenticator.LDAPAuthenticator'
          c.LDAPAuthenticator.lookup_dn = False
          c.LDAPAuthenticator.enable_auth_state = True
          c.LDAPAuthenticator.escape_userdn = False
          c.LDAPAuthenticator.server_address = "${var.ldap_server_host}"
          c.LDAPAuthenticator.server_port = ${var.ldap_server_port}
          c.LDAPAuthenticator.use_ssl = True
          c.LDAPAuthenticator.bind_dn_template = ${jsonencode(var.ldap_bind_dn_template)}
          c.LDAPAuthenticator.allowed_groups = ${jsonencode(var.ldap_allowed_groups)}
          c.LDAPAuthenticator.user_info_attributes = ${jsonencode(var.ldap_user_info_attrs)}
          c.LDAPAuthenticator.auth_state_attributes = ${jsonencode(var.ldap_auth_state_attrs)}
          c.JupyterHub.spawner_class = 'kubespawner.KubeSpawner'
          c.JupyterHub.log_level = 'INFO'
          c.JupyterHub.shutdown_on_logout = True
          c.KubeSpawner.http_timeout = 600
          c.KubeSpawner.start_timeout = 600
          c.KubeSpawner.debug = True
          c.Spawner.debug = True
          c.KubeSpawner.cmd = ['/usr/local/bin/start-notebook.sh']
        EOF
      }

Do you have any caching servers, or proxies on your network that might incorrectly cache something?

What authenticator are you using? Do you still have this problem with the Dummy authenticator?

Thanks for the reply. Unfortunately, I’m still very new to our setup. We use Active Directory for authentication to jupyterhub. We run sssd for the connection to AD.

We don’t have this issue in Prod, but Prod is still running version 3.0.2. I’m only aware of the classic load balancers provisioned by EKS and our FortiGate NGFW’s.

Unaware of the dummy authenticator, but looking it up now.

I have received sporadic reports of this same error since upgrading to 3.3.8 using the Google OAuthenticator going back a few months. We run on Google Cloud’s GKE service.

Since it is intermittent and doesn’t cause too many issues, I haven’t been able to deep dive into the infrastructure to see what is causing it. I will report back if I find anything useful.

2 Likes

Thanks for the reply! Any luck on your end?

I’ve been testing out proxy changes with no luck so far, although I haven’t been able to devote much time.