Postgres on GCE with workload identity

New user, first post, new to JupyterHub.

I am trying to use an instance of postgres on gce for jupyterhub. At one point I had it working and the db schema created, but it now doesn’t connect and I can’t figure out what I did before that worked

I have this in the config.yaml for helm:

hub:
  cookieSecret: 6e91bf348949d39d96cd7bb963e7b9f040be28314146516c60566705d773f563
  db:
    # url: postgresql://postgres:<redacted>@34.66.44.11:5432/jupyterhubdev
    # url: postgresql+psycopg2://postgres@127.0.0.1:5432/jupyterhubdev
    # url: 127.0.0.1
    # url: postgresql+psycopg2://<db-username>:<db-password>@<db-hostname>:<db-port>/<db-name>
    upgrade: true
    type: postgres
    pvc:
      accessModes:
        - ReadWriteMany
      storage: 11Gi

**`Preformatted text`and a workload identity yaml of this form:**

# 
spec:
      serviceAccountName: jupyter-dev-ksa
      nodeSelector:
        iam.gke.io/gke-metadata-server-enabled: "true"
      containers:
        - name: gke-jupyterhub
          # ... other container configuration
          image: gcr.io/jupyterhub-373622/2sigma:2eb9a54
          # This app listens on port 8080 for web traffic by default.
          ports:
            - containerPort: 8080
          env:
            - name: PORT
              value: "8080"
            # This project uses environment variables to determine
            # how you would like to run your application
            # To use the Go connector (recommended) - use INSTANCE_CONNECTION_NAME (proj:region:instance)
            # To use TCP - Setting INSTANCE_HOST will use TCP (e.g., 127.0.0.1)
            # To use Unix, use INSTANCE_UNIX_SOCKET (e.g., /cloudsql/proj:region:instance)
            - name: INSTANCE_HOST
              value: "127.0.0.1"
            # - name: INSTANCE_CONNECTION_NAME
            #   value: jupyterhub-373622:us-central1:jupyter-dev
            - name: DB_PORT
              value: "5432"
            - name: DB_USER
              valueFrom:
                secretKeyRef:
                  name: db-user-pass
                  key: username
            - name: DB_PASS
              valueFrom:
                secretKeyRef:
                  name: db-user-pass
                  key: password
            - name: DB_NAME
              valueFrom:
                secretKeyRef:
                  name: db-user-pass
                  key: database
        # [END cloud_sql_proxy_k8s_secrets]
        # [START cloud_sql_proxy_k8s_container]
        - name: cloud-sql-proxy
          # It is recommended to use the latest version of the Cloud SQL proxy
          # Make sure to update on a regular schedule!
          image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.1.0 # make sure to use the latest version
          args:
            # If connecting from a VPC-native GKE cluster, you can use the
            # following flag to have the proxy connect over private IP
            # - ip_address_types=PRIVATE"

            # Enable structured logging with LogEntry format:
            - "--structured-logs"

            # Replace DB_PORT with the port the proxy should listen on
            # Defaults: MySQL: 3306, Postgres: 5432, SQLServer: 1433
            - "--port=5432"
            # jupyterhub-373622:us-central1:jupyter-dev=tcp:0.0.0.0:5432"
            - "-instances=projects/jupyterhub-373622/global/networks/default=tcp:5432"
            # - "--credentials-file=/secrets/key.json"
            # - "-enable_iam_login"
          securityContext:
            runAsNonRoot: true
          resources:
            requests:
              memory: "2Gi"
              cpu: "1"
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet

**I’m having trouble finding documentation to fill in a missing piece, I still don’t fully understand the concept of how the hub knows how to use the sidecar, or how to verify my ports and other configs are all ok. Removing the url for the db from the config.yaml causes the hub to use the default sql. Anything I add fails to connect with **

connection to server at “127.0.0.1”, port 5432 failed: Connection refused

I’ve edited your post to format your code block using triple backticks, for future reference see this guide: Creating and highlighting code blocks - GitHub Docs

I’m not familiar with Postgres and GCE workload identities, but I’m guessing from your post that you run a special proxy container that connects to postgres using the identity, and JupyterHub connects to that container as if it were postgres? Assuming that’s the case, then running the postgres proxy container inside the same pod as the main hub process (i.e. as a sidecar) means the hub should be able to connect to it using 127.0.0.1. If it’s not then it sounds like a problem with your postgres proxy container.

Are you using Z2JH or your own Helm chart? How exactly are you adding the postgres proxy container to the deployment/chart?

Let’s see if this formats better

I have a simple chart deployed with:

helm upgrade --cleanup-on-fail --install helm-test jupyterhub/jupyterhub --namespace 2sigma --version=2.0.0 --values config.yaml

which I see with

helm repo list

The image was created with repo2docker and is just the jupyter/base-notebook with a few additional packages installed for our plan moving forward

Maybe adding the proxy container to the chart is the issue. I haven’t seen anything about that in the docs I’ve tried to follow.

As I mentioned, somehow this did work earlier. I can connect to the postgres db and see the users configured in the config.yaml, and the rest of the schema

I know at one point I tried setting tcp 0.0.0.0 and/or ports 5432:5432 but seem to have lost exactly where and how, and don’t know if that had anything to do with when it worked or not.

edit 3/14 - Now that I think about it, I did have other network entries I tried and also tried a connection to the private ip. Maybe something there worked and I didn’t realize it.

Maybe an alternate question would be ot anybody familiar with deploying on GCP Kubernetes in general, how can I get better visibility into my logs. I feel like I am working in the dark. Kubernetes and GCP are also completely new to me.

this is the entire config.yaml

singleuser:
  image:
    name: gcr.io/jupyterhub-373622/2sigma
    # tag: python-3.10
    tag: 8ae33d1
  # `cmd: null` allows the custom CMD of the Jupyter docker-stacks to be used
  # which performs further customization on startup.
  cmd: null

hub:
  cookieSecret: <redacted>
  db:
    url: postgresql://postgres:@127.0.0.1:5432/jupyterhubdev
    upgrade: true
    type: postgres
    pvc:
      accessModes:
        - ReadWriteMany
      storage: 12Gi
  config:
    GoogleOAuthenticator:
      client_id: <redacted>
      client_secret: <redacted>
      oauth_callback_url: http://jupyter-dev.2sigmaschool.org/hub/oauth_callback
      hosted_domain:
        - 2sigmaschool.org
      login_service: 2Sigma School
    JupyterHub:
      authenticator_class: google
      admin_access: true
    Authenticator:
      admin_users:
        - craig
        - vishal
        - student.one

This means the hub will try to connect to the postgresql proxy on localhost, so your proxy container must be in the hub pod. There’s no sign of this in your config.yaml.

The Z2JH docs contain some examples for debugging Z2JH K8s pods:

I added the following to my config.yaml, which seems to do what you describe

extraContainers:
    [
      {
        "name": cloud-sql-proxy,
        "image": "gcr.io/jupyterhub-373622/2sigma:8ae33d1",
      },
      {
        "name": gke-jupyterhub,
        "image": "gcr.io/jupyterhub-373622/2sigma:8ae33d1",
      },
    ]

After doing helm upgrade I see this in the logs

  Normal   Scheduled  4m57s                   default-scheduler  Successfully assigned 2sigma/hub-5fbc86c44f-4gckq to gke-jupytercluster-default-pool-a6b95700-5kqv
  Normal   Pulled     4m56s                   kubelet            Container image "gcr.io/jupyterhub-373622/2sigma:8ae33d1" already present on machine
  Normal   Created    4m56s                   kubelet            Created container cloud-sql-proxy
  Normal   Started    4m56s                   kubelet            Started container cloud-sql-proxy
  Normal   Pulled     4m56s                   kubelet            Container image "gcr.io/jupyterhub-373622/2sigma:8ae33d1" already present on machine
  Normal   Created    4m56s                   kubelet            Created container gke-jupyterhub
  Normal   Started    4m56s                   kubelet            Started container gke-jupyterhub
  Normal   Pulled     4m56s                   kubelet            Container image "jupyterhub/k8s-hub:2.0.0" already present on machine
  Normal   Created    4m56s                   kubelet            Created container hub
  Normal   Started    4m56s                   kubelet            Started container hub
  Warning  Unhealthy  4m31s (x16 over 4m55s)  kubelet            Readiness probe failed: Get "http://10.0.0.66:8081/hub/health": dial tcp 10.0.0.66:8081: connect: connection refused

Now trying to connect to postgresql+psycopg2://172.0.0.1:5432 is timing out rather than getting rejected. Changing to any random ip or port also times out though

entering something without the postgresql protocol reverts back to connecting to the sqlite

[D 2023-03-16 02:34:21.522 JupyterHub application:858] Loaded config file: /usr/local/etc/jupyterhub/jupyterhub_config.py
[I 2023-03-16 02:34:21.829 JupyterHub app:2775] Running JupyterHub version 3.0.0
[I 2023-03-16 02:34:21.831 JupyterHub app:2805] Using Authenticator: oauthenticator.google.GoogleOAuthenticator-15.1.0
[I 2023-03-16 02:34:21.832 JupyterHub app:2805] Using Spawner: kubespawner.spawner.KubeSpawner-4.2.0
[I 2023-03-16 02:34:21.832 JupyterHub app:2805] Using Proxy: jupyterhub.proxy.ConfigurableHTTPProxy-3.0.0
[D 2023-03-16 02:34:21.834 JupyterHub app:1783] Connecting to db: postgresql+psycopg2://172.0.0.1:5432

Does any of this sound like I’m heading the right way?

What’s the second container for? It looks like it’s the same image as the first.

If you’re running this as a sidecar in the hub pod 127.0.0.1 should work.

This seems like progress. I put the workload-identity info directly in the config.yaml and now I get an error that looks like my service account is either not used or is missing permission. AFAI can see, the service account I think I’m using seems ok

The cloud-sql-proxy is at the bottom of the included code snippet. Not sure how to dig deeper in the log to see what auth is being attempted

{"severity":"INFO","timestamp":"2023-03-21T03:58:03.467Z","message":"Authorizing with Application Default Credentials"}
{"severity":"ERROR","timestamp":"2023-03-21T03:58:03.829Z","message":"The proxy has encountered a terminal error: unable to start: failed to get instance: Refresh error: failed to get instance metadata (connection name = \"jupyterhub-373622:us-central1:jupyter-dev\"): googleapi: Error 403: The client is not authorized to make this request., notAuthorized"}
extraContainers:
    - name: gke-jupyterhub
      image: "docker:1.12.6"
      env:
        # - name: INSTANCE_HOST
        #   value: "127.0.0.1"
        - name: INSTANCE_CONNECTION_NAME
          value: jupyterhub-373622:us-central1:jupyter-dev
        - name: DB_PORT
          value: "5432"
        - name: DB_USER
          valueFrom:
            secretKeyRef:
              name: db-user-pass
              key: username
        - name: DB_PASS
          valueFrom:
            secretKeyRef:
              name: db-user-pass
              key: password
        - name: DB_NAME
          valueFrom:
            secretKeyRef:
              name: db-user-pass
              key: database

    - name: cloud-sql-proxy
      image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.1.1
      args:
        - "--structured-logs"
        - "--port=5432"
        - "jupyterhub-373622:us-central1:jupyter-dev"
      securityContext:
        runAsNonRoot: true
      resources:
        requests:
          memory: "2Gi"
          cpu: "1"

How have you specified the service account that the container or pod runs as?

I don’t know how I specify the service account now that I put the proxy in extracontainers. Originally, following the document I found at Google, I had the service account in a yaml I deployed as a deploy pod that had an “app” and a proxy like below. The jupyter-dev-ksa was supposed to be an identity that represents the application in the GKE cluster.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: jupyter-deploy
spec:
  selector:
    matchLabels:
      app: gke-jupyterhub
  template:
    metadata:
      labels:
        app: gke-jupyterhub
    spec:
      serviceAccountName: jupyter-dev-ksa
      nodeSelector:
        iam.gke.io/gke-metadata-server-enabled: "true"
      containers:
        - name: cloud-sql-proxy
          image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.1.1
          args:
            - "--structured-logs"
            - "--port=5432"
            - "jupyterhub-373622:us-central1:jupyter-dev"
...
...
...

This deploy does seem to connect to postgres

{"severity":"INFO","timestamp":"2023-03-22T03:18:29.335Z","message":"Authorizing with Application Default Credentials"}
{"severity":"INFO","timestamp":"2023-03-22T03:18:30.266Z","message":"[jupyterhub-373622:us-central1:jupyter-dev] Listening on 127.0.0.1:5432"}
{"severity":"INFO","timestamp":"2023-03-22T03:18:30.267Z","message":"The proxy has started successfully and is ready for new connections!"}

But I don’t know how to have the hub use it.

It seems to me that all I really need is my hub, which was running just fine with sqlite, and the extracontainer with the proxy that can use the jupyter-dev-ksa I created from following the Google document to connect to Postgres. Once the proxy connects, the hub just uses it by 172.0.0.1:5432.

I either need the hub in the sidecar or need the hub to use the sidecar. I don’t know how to get both to work at the same time

Thanks for all the help. I was able to assign a service account and put the cloud-sql-proxy in an extracontainers and get connected

hub:
  serviceAccount:
    name: jupyter-dev-ksa
    create: false
...
extraContainers:
    - name: cloud-sql-proxy
      image: gcr.io/cloud-sql-connectors/cloud-sql-proxy:2.1.1
      args:
        - "--structured-logs"
        # --address Address to bind Cloud SQL instance listeners. (default "127.0.0.1")
        - "--address=127.0.0.1"
        - "--port=5432"
...
1 Like