Unable to provision JupyterHub Helm Chart into multiple GKE namespaces with Terraform

rk92 · April 29, 2021, 2:42pm

I have been having issues with attempting to provision the JupyterHub 0.11.1 Helm chart into multiple namespaces at one time with Terraform.

Currently I have a Terraform module which loops through a map of namespaces and external IP address in order to set each namespace to one IP and have that used as an external Load Balancer. If I attempt to provision the chart in 3 namespaces only 1 succeeds. If I re-run my Terraform pipeline I can get the other 2 namespaces to provision without any issues so I am not sure why this doesn’t work on the first pass. I am keeping the proxy.SecretToken value as a constant for each deployment.

I am not sure of where to start troubleshooting this and also unsure of where any logs may be for why the chart didn’t provision into the other namespaces.

manics · April 29, 2021, 3:49pm

Since it’s working with one deployment it sounds like your problem is with Terraform rather than the JupyterHub Helm chart, though it’s possible your Z2JH configuration has an effect, especially if you’ve enabled cross-namespace features. Please could you:

Show us your full Z2JH configs with secrets redacted
Provide as much information as you can on how your K8s cluster is setup
Provide a link to your terraform files- this is a Jupyter forum, but some people may have enough experience with Terraform to spot some issues

rk92 · April 29, 2021, 4:40pm

Z2JH Helm Config.yaml:

  secretToken: <TOKEN>
  service:
    loadBalancerIP: <IP>
hub:
  config:
    Authenticator:
      admin_users:
      allowed_users:
    DummyAuthenticator:
    JupyterHub:
      authenticator_class: dummy
singleuser:
  profileList:
    - display_name: "Minimal environment"
      description: "To avoid too much bells and whistles: Python."
      default: true
    - display_name: "Datascience environment"
      description: "If you want the additional bells and whistles: Python, R, and Julia."
      kubespawner_override:
        image: jupyter/datascience-notebook:latest
    - display_name: "Spark environment"
      description: "The Jupyter Stacks spark image!"
      kubespawner_override:
        image: jupyter/all-spark-notebook:latest  
  memory:
    limit: 1G
    guarantee: 1G
  cpu:
    limit: .5
    guarantee: .5
  image:
    # You should replace the "latest" tag with a fixed version from:
    # https://hub.docker.com/r/jupyter/datascience-notebook/tags/
    # Inspect the Dockerfile at:
    # https://github.com/jupyter/docker-stacks/tree/master/datascience-notebook/Dockerfile
    name: jupyter/datascience-notebook
    pullPolicy: Always
    tag: latest
  defaultUrl: "/lab"

scheduling:
#   userScheduler:
#     enabled: true
#   podPriority:
#     enabled: true
#   userPlaceholder:
#     enabled: true
#     replicas: 2
  userPods:
    nodeAffinity:
      matchNodePurpose: require
  corePods:
    nodeAffinity:
      matchNodePurpose: require

cull:
  enabled: true
  timeout: 3600
  every: 3600

K8 Cluster Setup:

cluster_autoscaling = {
  enabled             = true
  autoscaling_profile = "BALANCED"
  min_cpu_cores       = 2
  max_cpu_cores       = 8
  min_memory_gb       = 8
  max_memory_gb       = 32
}

node_pools = [
  {
    "name" : "core-pool",
    "auto_repair" : true,
    "auto_upgrade" : true,
    "autoscaling" : true
    "disk_size_gb" : "50",
    "disk_type" : "pd-standard",
    "enable_secure_boot" : true,
    "image_type" : "cos_containerd",
    "initial_node_count" : 1
    "local_ssd_count" : 0,
    "machine_type" : "n2-standard-4",
    "max_count" : 3,
    "min_count" : 1,
    "node_locations" : "us-central1-a",
    "preemptible" : true
  },
  {
    "name" : "user-pool",
    "auto_repair" : true,
    "auto_upgrade" : true,
    "autoscaling" : true
    "disk_size_gb" : "50",
    "disk_type" : "pd-standard",
    "enable_secure_boot" : true,
    "image_type" : "cos_containerd",
    "initial_node_count" : 1
    "local_ssd_count" : 0,
    "machine_type" : "n2-standard-4",
    "max_count" : 3,
    "min_count" : 1,
    "node_locations" : "us-central1-a",
    "preemptible" : true
  }
]

node_pools_labels = {
  user-pool = {
    "hub.jupyter.org/node-purpose" = "user"
  }
  core-pool = {
    "hub.jupyter.org/node-purpose" = "core"
  }
}

node_pools_taints = {
  user-pool = [
    {
      key    = "hub.jupyter.org/dedicated"
      value  = "user"
      effect = "NO_SCHEDULE"
    },
  ]
}

Helm Resource Provisioning:

resource "helm_release" "jupyterhub" {
  for_each = var.namespaces

  name       = var.release_name
  repository = var.repository_url
  chart      = var.helm_chart
  version    = var.helm_version
  namespace  = each.key
  timeout    = var.timeout
  values     = var.values

  set {
    name  = "proxy.service.loadBalancerIP"
    value = each.value
  }

  set {
    name  = "proxy.secretToken"
    value = random_id.secret_token.id
  }

  // There is a bug in the helm provider v2.0.3 and this is the work around
  // https://github.com/hashicorp/terraform-provider-helm/issues/701
  // https://github.com/jupyterhub/zero-to-jupyterhub-k8s/issues/1998

  set {
    name  = "custom.whatever"
    value = "doesnotmatter"
  }
}

I am unable to provide links to any GitHub file due to it being a private repository.

If I run kubectl get events on a namespace which failed this is what I see.

85s         Normal    Scheduled                pod/hook-image-awaiter-22v7v           Successfully assigned phys202/hook-image-awaiter-22v7v to gke-tf-jh-cluster-core-pool-0d97fecb-sxd8
85s         Normal    Scheduled                pod/hook-image-puller-sr45s            Successfully assigned phys202/hook-image-puller-sr45s to gke-tf-jh-cluster-user-pool-bd2cec1e-jr3v
85s         Normal    SuccessfulCreate         daemonset/hook-image-puller            Created pod: hook-image-puller-sr45s
85s         Normal    SuccessfulCreate         job/hook-image-awaiter                 Created pod: hook-image-awaiter-22v7v
84s         Normal    Pulled                   pod/hook-image-awaiter-22v7v           Container image "jupyterhub/k8s-image-awaiter:0.11.1" already present on machine
84s         Normal    Started                  pod/hook-image-puller-sr45s            Started container image-pull-metadata-block
84s         Normal    Started                  pod/hook-image-awaiter-22v7v           Started container hook-image-awaiter
84s         Normal    Created                  pod/hook-image-awaiter-22v7v           Created container hook-image-awaiter
84s         Normal    Pulled                   pod/hook-image-puller-sr45s            Container image "jupyterhub/k8s-network-tools:0.11.1" already present on machine
84s         Normal    Created                  pod/hook-image-puller-sr45s            Created container image-pull-metadata-block
83s         Normal    Pulling                  pod/hook-image-puller-sr45s            Pulling image "jupyter/datascience-notebook:latest"
81s         Normal    Pulled                   pod/hook-image-puller-sr45s            Successfully pulled image "jupyter/datascience-notebook:latest"
81s         Normal    Started                  pod/hook-image-puller-sr45s            Started container image-pull-singleuser
81s         Normal    Created                  pod/hook-image-puller-sr45s            Created container image-pull-singleuser
80s         Normal    Pulling                  pod/hook-image-puller-sr45s            Pulling image "jupyter/datascience-notebook:latest"
79s         Normal    Pulled                   pod/hook-image-puller-sr45s            Successfully pulled image "jupyter/datascience-notebook:latest"
79s         Normal    Created                  pod/hook-image-puller-sr45s            Created container image-pull-singleuser-profilelist-1
79s         Normal    Started                  pod/hook-image-puller-sr45s            Started container image-pull-singleuser-profilelist-1
78s         Normal    Pulling                  pod/hook-image-puller-sr45s            Pulling image "jupyter/all-spark-notebook:latest"
77s         Normal    Pulled                   pod/hook-image-puller-sr45s            Successfully pulled image "jupyter/all-spark-notebook:latest"
77s         Normal    Created                  pod/hook-image-puller-sr45s            Created container image-pull-singleuser-profilelist-2
77s         Normal    Started                  pod/hook-image-puller-sr45s            Started container image-pull-singleuser-profilelist-2
76s         Normal    Pulled                   pod/hook-image-puller-sr45s            Container image "k8s.gcr.io/pause:3.2" already present on machine
76s         Normal    Created                  pod/hook-image-puller-sr45s            Created container pause
76s         Normal    Started                  pod/hook-image-puller-sr45s            Started container pause
4s          Normal    NoPods                   poddisruptionbudget/user-placeholder   No matching pods found
4s          Normal    NoPods                   poddisruptionbudget/user-scheduler     No matching pods found
73s         Normal    Killing                  pod/hook-image-puller-sr45s            Stopping container pause
70s         Normal    ProvisioningSucceeded    persistentvolumeclaim/hub-db-dir       Successfully provisioned volume pvc-5d26cf3e-9717-4640-bc3a-39e8a781ebeb using kubernetes.io

rk92 · April 29, 2021, 4:51pm

Just to ask, do I need multiple secret tokens for each namespace that a Helm chart would be provisioned into? The secret token is used to authenticate between the hub and proxy pods correct?

manics · May 1, 2021, 12:04pm

This is a guess, but maybe having multiple image pullers for the same image across different deployments is a problem? You could try disabling it.

proxy.secretToken secures traffic between the hub and proxy. It’s best practice to use different tokens but.you’re obviously free to make the security trade-off. The latest dev version of Z2JH uses some newish Helm features to autogenerate the secret tokens for new deployments.

rk92 · May 3, 2021, 10:27pm

Thanks for the reply. Just to confirm the image puller you reference is the hook image puller? I’ll give the below code a try.

prePuller:
  hook:
    enabled: false

manics · May 4, 2021, 8:28pm

github.com

jupyterhub/zero-to-jupyterhub-k8s/blob/ca859aafb41d4081388ea5790c26e69b106e91e3/jupyterhub/values.yaml#L457-L490


# prePuller relates to the hook|continuous-image-puller DaemonsSets
prePuller:
  annotations: {}
  resources:
    requests:
      cpu: 0
      memory: 0
  containerSecurityContext:
    runAsUser: 65534  # nobody user
    runAsGroup: 65534 # nobody group
    allowPrivilegeEscalation: false
  extraTolerations: []
  # hook relates to the hook-image-awaiter Job and hook-image-puller DaemonSet
  hook:
    enabled: true
    # image and the configuration below relates to the hook-image-awaiter Job
    image:
      name: jupyterhub/k8s-image-awaiter
      tag: 'set-by-chartpress'
      pullPolicy: ''

This file has been truncated. show original

It’s probably worth disabling the continuous prePuller too:

  hook:
    enabled: false
  continuous:
    enabled: false

If that doesn’t work you could try disabling the user scheduler and userPlaceholder? That’s the only other cluster wide resource I can think of:

github.com

jupyterhub/zero-to-jupyterhub-k8s/blob/ca859aafb41d4081388ea5790c26e69b106e91e3/jupyterhub/values.yaml#L385-L443


scheduling:
  userScheduler:
    enabled: true
    replicas: 2
    logLevel: 4
    # plugins ref: https://kubernetes.io/docs/reference/scheduling/config/#scheduling-plugins-1
    plugins:
      score:
        disabled:
          - name: SelectorSpread
          - name: TaintToleration
          - name: PodTopologySpread
          - name: NodeResourcesBalancedAllocation
          - name: NodeResourcesLeastAllocated
          # Disable plugins to be allowed to enable them again with a different
          # weight and avoid an error.
          - name: NodePreferAvoidPods
          - name: NodeAffinity
          - name: InterPodAffinity
          - name: ImageLocality

This file has been truncated. show original

rk92 · May 4, 2021, 8:47pm

Thanks for the response, I’ll try that out and see.

My workaround at the moment is to just run two Terraform apply steps within a single YAML file to ensure that the other namespaces have the Helm chart provision properly.

rk92 · May 19, 2021, 2:28pm

@manics This was fixed after setting userScheduler and podPriority to false for both. Thanks for the help.

My alternative workaround was to run two terraform apply steps one after the other to ensure that all namespaces were provisioned properly.

Topic		Replies	Views
Spawn multiple hubs in the same kubernetes cluster using Helm Zero to JupyterHub on Kubernetes	8	1159	January 16, 2024
Helm install multiple jupyterhub instances into different namespaces on the same cluster JupyterHub	2	995	February 9, 2021
Deploying multiple JupyterHub charts into the same namespace? JupyterHub	1	526	January 20, 2022
Deploy multiple jupyterhub instances in the same namespace Zero to JupyterHub on Kubernetes	0	372	October 25, 2022
Error creating hub and proxy pods Zero to JupyterHub on Kubernetes	4	1008	January 25, 2023

Unable to provision JupyterHub Helm Chart into multiple GKE namespaces with Terraform

Related Topics