Jupyterhub starts to throws Http 403 Errors on all REST API calls after a couple of days

Created a service and an api token (openssl rand -hex 32) for an application that is making REST API calls to Jupyterhub. Successfully deployed Jupyterhub version 1.2.2 a week ago. There have been no subsequent redeployments of Jupyterhub.

The application was able to successfully execute the Jupyterhub REST API for about a week. Then all of a sudden, the application started to receive HTTP 403 errors on all of its REST API calls to Jupyterhub.

All of the smoke tests I created in Postman (e.g., GET http://hub/api/proxy) using the same API token started to fail. I was able to successfully execute the same smoke tests with the same API token last week; and all of a sudden they started to fail with a 403 error.

I am totally perplexed. Looked at the Jupyter and Tornado source code and tried to figure out where or what is throwing the 403 error. None of the Jupyter handlers or api handlers seem to be the culprit. Same goes for the base Tornado RequestHandler. At least that is my observation; and I could be wrong.

Do API tokens that are used for services have an expiration timestamp? Is it possible that the api token has expired?
Code snippet below shows how we are setting up the api token for the service.

c.JupyterHub.services = [
{
“name”: “service-token”,
“admin”: True,
“api_token”: os.getenv(“JUPYTER_API_TOKEN”),
},
]

Using Jupyterhub 1.2.2.

Please advise.

Hi! Do you have the JupyterHub logs for the period between the last successful use of the token and the first failure? Could you also show us your full JupyterHub config with secrets redacted, just in case it’s relevant? Thanks!

How do I upload a log file to this site?

Using the out of the box jupyterhub-config that ships with version 0.10.6 of Z2JK. No mods were made to that file as per the recommendations in the documentation.

All modifications to the Hub’s configuration are made through the values.yaml file.

Here are the hub configuration settings from our values.yaml file.

hub relates to the hub pod, responsible for running JupyterHub, its configured

Authenticator class KubeSpawner, and its configured Proxy class

ConfigurableHTTPProxy. KubeSpawner creates the user pods, and

ConfigurableHTTPProxy speaks with the actual ConfigurableHTTPProxy server in

the proxy pod.

hub:
service:
type: ClusterIP
annotations: {}
ports:
nodePort:
loadBalancerIP:
baseUrl: /
cookieSecret:
publicURL:
initContainers:

  • name: fix-permissions
    image: /edai/data-science/jupyter/jupyterhub:v1.2.2-develop-20210601T211138
    command: ["/bin/bash", “-c”]
    args:

    • set -x
      ls -lart /home/jovyan/database-secrets;
      cat /home/jovyan/postgres-cert/server-ca.pem;
      cp -RvL /home/jovyan/database-secrets/*.pem /home/jovyan/postgres-cert;
      chown -R 1000:1000 /home/jovyan/postgres-cert;
      find /home/jovyan/postgres-cert -type d -exec chmod 700 {} ; ;
      find /home/jovyan/postgres-cert -type f -exec chmod 400 {} ; ;
      ls -lart /home/jovyan/postgres-cert && id;
      cat /home/jovyan/postgres-cert/server-ca.pem;
      volumeMounts:
    • name: postgres-cert
      mountPath: /home/jovyan/postgres-cert
    • name: database-secrets
      mountPath: /home/jovyan/database-secrets
      securityContext:
      runAsUser: 0
      fsGid: 1000
      nodeSelector: {}
      tolerations: []
      concurrentSpawnLimit: 64
      consecutiveFailureLimit: 5
      activeServerLimit:
      deploymentStrategy:

    type: Recreate

    - sqlite-pvc backed hubs require the Recreate deployment strategy as a

    typical PVC storage can only be bound to one pod at the time.

    - JupyterHub isn’t designed to support being run in parallell. More work

    needs to be done in JupyterHub itself for a fully highly available (HA)

    deployment of JupyterHub on k8s is to be possible.

    type: Recreate
    db:
    type: postgres
    upgrade:
    pvc:
    annotations: {}
    selector: {}
    accessModes:
    - ReadWriteOnce
    storage: 10Gi
    subPath:
    storageClassName:
    url:
    password:
    labels: {}
    annotations: {sidecar.istio.io/inject: “false”}
    command: []
    args: []
    extraConfig:
    JupyterhubConfig.py: |
    import os
    from binascii import a2b_hex
    os.system(“cat /home/jovyan/postgres-cert/server-ca.pem”)
    c.JupyterHub.db_url = os.environ[“PG_DB_URL”]
    cookie_secret_hex = os.environ[“JPY_COOKIE_SECRET”]
    if cookie_secret_hex:
    c.JupyterHub.cookie_secret = a2b_hex(cookie_secret_hex)

    def printKwargs(**kwargs):
    for key, value in kwargs.items():
    print ("%s == %s" %(key, value))

    db_certs_dir = os.getenv(‘DB_CERTS_DIR’)
    server_ca_pem = db_certs_dir + “/” + “server-ca.pem”
    os.system(“cat /home/jovyan/postgres-cert/server-ca.pem”)
    os.system(“cat /home/jovyan/postgres-cert/server-ca.pem”)
    client_cert_pem = db_certs_dir + “/” + “client-cert.pem”
    client_key_pem = db_certs_dir + “/” + “client-key.pem”
    sslmode = os.getenv(‘SSLMODE’)
    ssl_args = {‘sslmode’: sslmode, ‘sslrootcert’:server_ca_pem, ‘sslcert’:client_cert_pem, ‘sslkey’:client_key_pem}

    c.JupyterHub.db_kwargs={‘connect_args’:ssl_args}

    c.Spawner.debug = True
    c.ConfigurableHTTPProxy.debug = True

    c.JupyterHub.services = [
    {
    “name”: “service-token”,
    “admin”: True,
    “api_token”: os.getenv(“JUPYTER_API_TOKEN”),
    },
    ]

    c.JupyterHub.trust_user_provided_tokens = True
    c.JupyterHub.admin_access = True

    JupyterhubAuthConfig.py: |
    import os
    from sso_authenticator import PingAuthenticator
    c.JupyterHub.authenticator_class = PingAuthenticator
    c.PingAuthenticator.enable_auth_state = True
    c.PingAuthenticator.auto_login = True
    c.CryptKeeper.keys = [ os.urandom(32) ]

    JupyterhubSpawnerConfig.py: |
    from custom_spawner import CustomKubeSpawner
    c.JupyterHub.spawner_class = CustomKubeSpawner
    c.KubeSpawner.http_timeout = 60
    c.CustomKubeSpawner.environment = {
    ‘METASERVICES_URL’ : ‘/compute-metaservice’,
    ‘ADMIN_ROLE’:
    }

    def authstate_hook(spawner, auth_state):
    if auth_state:
    spawner.auth_state = auth_state
    else:
    spawner.auth_state = {}
    c.CustomKubeSpawner.auth_state_hook = authstate_hook

extraConfigMap: {}
extraEnv:

OAUTH_CLIENT_ID: <REDACTED>
OAUTH_CALLBACK_URL: <URL REDACTED>/hub/oauth_callback
OAUTH2_AUTHORIZE_URL: <URL REDACTED>/as/authorization.oauth2 
OAUTH2_TOKEN_URL: <URL REDACTED>/as/token.oauth2
OAUTH2_USERDATA_URL: <URL REDACTED>/idp/userinfo.openid
OAUTH2_LOGIN_SERVICE: <REDACTED>
OAUTH2_USERNAME_KEY: sub
OAUTH2_AUTHN_SOURCE:<REDACTED>
ADMIN_ROLE: <REDACTED>
SSLMODE: 'verify-ca'
DB_CERTS_DIR: '/home/jovyan/postgres-cert'

PGPASSWORD:
  name: PGPASSWORD
  valueFrom:
    secretKeyRef:
      key: hub.db.password
      name: database-secrets
JPY_COOKIE_SECRET:
  name: JPY_COOKIE_SECRET
  valueFrom:
    secretKeyRef:
      key: hub.cookieSecret
      name: database-secrets
PG_DB_URL:
  name: PG_DB_URL
  valueFrom:
    secretKeyRef:
      key: hub.db.url
      name: database-secrets
OAUTH_CLIENT_SECRET:
  name: OAUTH_CLIENT_SECRET
  valueFrom:
    secretKeyRef:
      key: OAUTH_CLIENT_SECRET
      name: ping-federate-secret
JUPYTER_API_TOKEN:
  name: JUPYTER_API_TOKEN
  valueFrom:
    secretKeyRef:
      key: JUPYTER-API-TOKEN
      name: jupyterhub-api-secrets

extraContainers: []
extraVolumes:

  • name: database-secrets
    secret:
    secretName: database-secrets
    defaultMode: 0400
  • name: postgres-cert
    emptyDir: {}
    extraVolumeMounts:
  • name: database-secrets
    mountPath: /home/jovyan/database-secrets
    readOnly: true
  • name: postgres-cert
    mountPath: /home/jovyan/postgres-cert
    readOnly: true

image:
name: URL REDACTED/edai/data-science/jupyter/jupyterhub
tag: v1.2.2-develop-20210601T211138
pullPolicy: IfNotPresent
pullSecrets: []
resources:
requests:
cpu: 200m
memory: 512Mi
containerSecurityContext:
runAsUser: 1000
runAsGroup: 1000
allowPrivilegeEscalation: false
services: {}
pdb:
enabled: true
minAvailable: 1
networkPolicy:
enabled: true
ingress: []
## egress for JupyterHub already includes Kubernetes internal DNS and
## access to the proxy, but can be restricted further, but ensure to allow
## access to the Kubernetes API server that couldn’t be pinned ahead of
## time.
##
## ref: kubernetes - Whitelist kube-apiserver with Network Policy - Stack Overflow
egress:
- to:
- ipBlock:
cidr: 0.0.0.0/0
interNamespaceAccessLabels: ignore
allowedIngressPorts: []
allowNamedServers: true
namedServerLimitPerUser:
authenticatePrometheus:
redirectToServer:
shutdownOnLogout:
templatePaths: []
templateVars: {}
livenessProbe:
enabled: true
initialDelaySeconds: 180
periodSeconds: 10
failureThreshold: 3
timeoutSeconds: 1
readinessProbe:
enabled: true
initialDelaySeconds: 180
periodSeconds: 10
failureThreshold: 3
timeoutSeconds: 1
existingSecret:

rbac:
enabled: true

Lastly, the only way I know to resolve this problem is to redeploy from scratch which includes blowing away the jupyterhub db.

Here is some additional info about the Jupyterhub config:

{
“version”: “1.2.2”,
“python”: “3.8.5 (default, Jan 27 2021, 15:41:15) \n[GCC 9.3.0]”,
“sys_executable”: “/usr/bin/python3”,
“spawner”: {
“class”: “custom_spawner.CustomKubeSpawner”,
“version”: “unknown”
},
“authenticator”: {
“class”: “sso_authenticator.PingAuthenticator”,
“version”: “unknown”
}
}

Finally, please tell me how I can get the hub’s log file to you. There does not appear to be a way for me to upload a file to this site.

If you’ve got a GitHub account you can upload the log as a gist: https://gist.github.com/
Otherwise a pastebin type site would also do.

Would you mind reformatting your configuration as a code block using triple backticks: Creating and highlighting code blocks - GitHub Docs
Unfortunately the indentation has been lost which makes it difficult to read.

Finally have you tried the latest Z2JH version, the one you’re using is quite old.

Thanks!

@manics
Figured out what the problem is but don’t know how to solve it.

The jupyterhub culling service is deleting the “service” (record in the service table) and the corresponding “API token” (record in the api token table) we created for calling applications that invoke the Jupyterhub REST API. All resulting jupyter REST API calls using the “API Token” fail with a 403 error because the API Token record was deleted from the data base.

I set cull.enabled to “false” in our values.yaml file but it still appears the culling service is still culling. Here are the relevant settings.

cull:
^^enabled: false
^^users: false
^^removeNamedServers: false
^^timeout: 43200
^^every: 600
^^concurrency: 10
^^maxAge: 0

Added “^^” a denote two spaces in the yaml snippet.

Again, we are using Z2JK 0.10.6 and Jupyterhub 1.2.2.

Please advise.

Below is the SQL statement that results in a 403 error when I try to delete a user using the Jupyterhub REST API

INFO:sqlalchemy.engine.base.Engine:SELECT api_tokens.user_id AS api_tokens_user_id, api_tokens.service_id AS api_tokens_service_id, api_tokens.id AS api_tokens_id, api_tokens.hashed AS api_tokens_hashed, api_tokens.prefix AS api_tokens_prefix, api_tokens.created AS api_tokens_created, api_tokens.expires_at AS api_tokens_expires_at, api_tokens.last_activity AS api_tokens_last_activity, api_tokens.note AS api_tokens_note

8/9/2021 7:41:39 PM FROM api_tokens

8/9/2021 7:41:39 PM WHERE (%(prefix)s LIKE api_tokens.prefix || ‘%%’) AND (api_tokens.expires_at IS NULL OR api_tokens.expires_at >= %(expires_at_1)s)

8/9/2021 7:41:39 PM INFO:sqlalchemy.engine.base.Engine:{‘prefix’: ‘33f7’, ‘expires_at_1’: datetime.datetime(2021, 8, 10, 0, 41, 39, 591585)}

8/9/2021 7:41:39 PM 2021-08-09 20:41:39,595 INFO sqlalchemy.engine.base.Engine SELECT api_tokens.user_id AS api_tokens_user_id, api_tokens.service_id AS api_tokens_service_id, api_tokens.id AS api_tokens_id, api_tokens.hashed AS api_tokens_hashed, api_tokens.prefix AS api_tokens_prefix, api_tokens.created AS api_tokens_created, api_tokens.expires_at AS api_tokens_expires_at, api_tokens.last_activity AS api_tokens_last_activity, api_tokens.note AS api_tokens_note

8/9/2021 7:41:39 PM FROM api_tokens

8/9/2021 7:41:39 PM WHERE (%(prefix)s LIKE api_tokens.prefix || ‘%%’) AND (api_tokens.expires_at IS NULL OR api_tokens.expires_at >= %(expires_at_1)s)

8/9/2021 7:41:39 PM 2021-08-09 20:41:39,595 INFO sqlalchemy.engine.base.Engine {‘prefix’: ‘33f7’, ‘expires_at_1’: datetime.datetime(2021, 8, 10, 0, 41, 39, 591585)}. There are no records that meet this critieria because the record was deleted by the culling service.

8/9/2021 7:41:39 PM [W 2021-08-09 20:41:39.597 JupyterHub log:181] 403 DELETE /hub/api/users/AuthZUser23 (@::ffff:127.0.0.1) 7.00ms

@manics

Including images of the services table and the api-tokens table before and after the culling service is invoked. As you can hopefully see, service account and api-token that is associated with the callers of the Jupyterhub’s REST API are zapped from the database by the culling service. Not sure how to move forward as I am disabling culling in the values.yaml by setting cull.enabled: false. Please advise.

Before Culling


After Culling


@manics Submitted an issue ( #2351) to GitHub - jupyterhub/zero-to-jupyterhub-k8s: Helm Chart & Documentation for deploying JupyterHub on Kubernetes. Seems like that is the appropriate thing to do because clearly I have encountered a bug in Z2JK.

Is this reproducible with the latest Z2JH version? 0.10.6 is very old, and there have been several updates to JupyterHub and the idle-culler since then.

Figured out the problem.

We made the API token that is used by a calling application to invoke Jupyter Hub’s REST API a K8s secret. The Jenkins CI/CD that we use to deploy secrets to K8s mangled to the token’s value is stored in the K8s Secret object. As a result, the API token value stored in Jupyter Hub’s database became mangled as well.

We fixed the Jenkins CI/CD pipeline so that it no longer mangles values.