Thank you for the prompt response, really appreciate it.
I am able to curl on the clusterIP to fetch the service account after disabling the blockIP so its not entirely concealed.
curl -H "Metadata-Flavor: Google" http://169.254.169.254/computeMetadata/v1/instance/service-accounts/default/email
I am using the 2.0.0 version, which appears to be the latest one. and my configs are as follows:
# fullnameOverride and nameOverride distinguishes blank strings, null values,
# and non-blank strings. For more details, see the configuration reference.
fullnameOverride: "***-a"
nameOverride:
# custom can contain anything you want to pass to the hub pod, as all passed
# Helm template values will be made available there.
custom: {}
# imagePullSecret is configuration to create a k8s Secret that Helm chart's pods
# can get credentials from to pull their images.
imagePullSecret:
create: false
automaticReferenceInjection: true
registry:
username:
password:
email:
# imagePullSecrets is configuration to reference the k8s Secret resources the
# Helm chart's pods can get credentials from to pull their images.
imagePullSecrets: []
# hub relates to the hub pod, responsible for running JupyterHub, its configured
# Authenticator class KubeSpawner, and its configured Proxy class
# ConfigurableHTTPProxy. KubeSpawner creates the user pods, and
# ConfigurableHTTPProxy speaks with the actual ConfigurableHTTPProxy server in
# the proxy pod.
#
# https://stackoverflow.com/questions/52868644/jupyterhub-create-user-and-home-at-login
hub:
revisionHistoryLimit:
config:
JupyterHub:
authenticator_class: ldapauthenticator.LDAPAuthenticator
LDAPAuthenticator:
allowed_groups:
- cn=etsians,ou=Group,dc=**,dc=com
bind_dn_template:
- uid={username},ou=People,dc=**,dc=com
server_address: ldap://ldap.***cloud.com
service:
type: ClusterIP
annotations: {}
ports:
nodePort:
extraPorts: []
loadBalancerIP:
baseUrl: /
cookieSecret:
initContainers: []
nodeSelector: {}
tolerations: []
concurrentSpawnLimit: 64
consecutiveFailureLimit: 5
activeServerLimit:
deploymentStrategy:
## type: Recreate
## - sqlite-pvc backed hubs require the Recreate deployment strategy as a
## typical PVC storage can only be bound to one pod at the time.
## - JupyterHub isn't designed to support being run in parallell. More work
## needs to be done in JupyterHub itself for a fully highly available (HA)
## deployment of JupyterHub on k8s is to be possible.
type: Recreate
db:
type: sqlite-pvc
upgrade:
pvc:
annotations: {}
selector: {}
accessModes:
- ReadWriteOnce
storage: 1Gi
subPath:
storageClassName:
url:
password:
labels: {}
annotations: {}
command: []
args: []
extraConfig: {}
extraFiles: {}
extraEnv: {}
extraContainers: []
extraVolumes: []
extraVolumeMounts: []
image:
name: jupyterhub/k8s-hub
tag: "2.0.0"
pullPolicy:
pullSecrets: []
resources: {}
podSecurityContext:
fsGroup: 1000
containerSecurityContext:
runAsUser: 1000
runAsGroup: 1000
allowPrivilegeEscalation: false
lifecycle: {}
loadRoles: {}
services: {}
pdb:
enabled: false
maxUnavailable:
minAvailable: 1
networkPolicy:
enabled: true
ingress: []
egress: []
egressAllowRules:
cloudMetadataServer: true
dnsPortsPrivateIPs: true
nonPrivateIPs: true
privateIPs: true
interNamespaceAccessLabels: ignore
allowedIngressPorts: []
allowNamedServers: false
namedServerLimitPerUser:
authenticatePrometheus:
redirectToServer:
shutdownOnLogout:
templatePaths: []
templateVars: {}
livenessProbe:
# The livenessProbe's aim to give JupyterHub sufficient time to startup but
# be able to restart if it becomes unresponsive for ~5 min.
enabled: true
initialDelaySeconds: 300
periodSeconds: 10
failureThreshold: 30
timeoutSeconds: 3
readinessProbe:
# The readinessProbe's aim is to provide a successful startup indication,
# but following that never become unready before its livenessProbe fail and
# restarts it if needed. To become unready following startup serves no
# purpose as there are no other pod to fallback to in our non-HA deployment.
enabled: true
initialDelaySeconds: 0
periodSeconds: 2
failureThreshold: 1000
timeoutSeconds: 1
existingSecret:
serviceAccount:
create: true
name:
annotations: {}
extraPodSpec:
rbac:
create: true
# proxy relates to the proxy pod, the proxy-public service, and the autohttps
# pod and proxy-http service.
proxy:
https:
enabled: true
hosts:
- ***.com
letsencrypt:
contactEmail: ***@***.com
annotations: {}
deploymentStrategy:
## type: Recreate
## - JupyterHub's interaction with the CHP proxy becomes a lot more robust
## with this configuration. To understand this, consider that JupyterHub
## during startup will interact a lot with the k8s service to reach a
## ready proxy pod. If the hub pod during a helm upgrade is restarting
## directly while the proxy pod is making a rolling upgrade, the hub pod
## could end up running a sequence of interactions with the old proxy pod
## and finishing up the sequence of interactions with the new proxy pod.
## As CHP proxy pods carry individual state this is very error prone. One
## outcome when not using Recreate as a strategy has been that user pods
## have been deleted by the hub pod because it considered them unreachable
## as it only configured the old proxy pod but not the new before trying
## to reach them.
type: Recreate
## rollingUpdate:
## - WARNING:
## This is required to be set explicitly blank! Without it being
## explicitly blank, k8s will let eventual old values under rollingUpdate
## remain and then the Deployment becomes invalid and a helm upgrade would
## fail with an error like this:
##
## UPGRADE FAILED
## Error: Deployment.apps "proxy" is invalid: spec.strategy.rollingUpdate: Forbidden: may not be specified when strategy `type` is 'Recreate'
## Error: UPGRADE FAILED: Deployment.apps "proxy" is invalid: spec.strategy.rollingUpdate: Forbidden: may not be specified when strategy `type` is 'Recreate'
rollingUpdate:
# service relates to the proxy-public service
service:
type: LoadBalancer
labels: {}
annotations:
networking.gke.io/load-balancer-type: "Internal"
nodePorts:
http:
https:
disableHttpPort: false
extraPorts: []
loadBalancerIP:
loadBalancerSourceRanges: []
# chp relates to the proxy pod, which is responsible for routing traffic based
# on dynamic configuration sent from JupyterHub to CHP's REST API.
chp:
revisionHistoryLimit:
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
image:
name: jupyterhub/configurable-http-proxy
# tag is automatically bumped to new patch versions by the
# watch-dependencies.yaml workflow.
#
tag: "4.5.3" # https://github.com/jupyterhub/configurable-http-proxy/releases
pullPolicy:
pullSecrets: []
extraCommandLineFlags: []
livenessProbe:
enabled: true
initialDelaySeconds: 60
periodSeconds: 10
failureThreshold: 30
timeoutSeconds: 3
readinessProbe:
enabled: true
initialDelaySeconds: 0
periodSeconds: 2
failureThreshold: 1000
timeoutSeconds: 1
resources: {}
defaultTarget:
errorTarget:
extraEnv: {}
nodeSelector: {}
tolerations: []
networkPolicy:
enabled: true
ingress: []
egress: []
egressAllowRules:
cloudMetadataServer: true
dnsPortsPrivateIPs: true
nonPrivateIPs: true
privateIPs: true
interNamespaceAccessLabels: ignore
allowedIngressPorts: [http, https]
pdb:
enabled: false
maxUnavailable:
minAvailable: 1
extraPodSpec:
# traefik relates to the autohttps pod, which is responsible for TLS
# termination when proxy.https.type=letsencrypt.
traefik:
revisionHistoryLimit:
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
image:
name: traefik
# tag is automatically bumped to new patch versions by the
# watch-dependencies.yaml workflow.
#
tag: "v2.8.4" # ref: https://hub.docker.com/_/traefik?tab=tags
pullPolicy:
pullSecrets: []
hsts:
includeSubdomains: false
preload: false
maxAge: 15724800 # About 6 months
resources: {}
labels: {}
extraInitContainers: []
extraEnv: {}
extraVolumes: []
extraVolumeMounts: []
extraStaticConfig: {}
extraDynamicConfig: {}
nodeSelector: {}
tolerations: []
extraPorts: []
networkPolicy:
enabled: true
ingress: []
egress: []
egressAllowRules:
cloudMetadataServer: true
dnsPortsPrivateIPs: true
nonPrivateIPs: true
privateIPs: true
interNamespaceAccessLabels: ignore
allowedIngressPorts: [http, https]
pdb:
enabled: false
maxUnavailable:
minAvailable: 1
serviceAccount:
create: true
name:
annotations: {}
extraPodSpec: {}
secr***nc:
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
image:
name: jupyterhub/k8s-secret-sync
tag: "2.0.0"
pullPolicy:
pullSecrets: []
resources: {}
labels: {}
https:
enabled: false
type: letsencrypt
#type: letsencrypt, manual, offload, secret
letsencrypt:
contactEmail:
# Specify custom server here (https://acme-staging-v02.api.letsencrypt.org/directory) to hit staging LE
acmeServer: https://acme-v02.api.letsencrypt.org/directory
manual:
key:
cert:
secret:
name:
key: tls.key
crt: tls.crt
hosts: []
# singleuser relates to the configuration of KubeSpawner which runs in the hub
# pod, and its spawning of user pods such as jupyter-myusername.
# Ref: https://stackoverflow.com/a/47965773 need to set tolerations and nodeselector
singleuser:
podNameTemplate:
extraTolerations: [
{
key: nvidia.com/gpu,
operator: Equal,
value: present,
effect: NoSchedule
}
]
extraNodeAffinity:
required: []
preferred: []
extraPodAffinity:
required: []
preferred: []
extraPodAntiAffinity:
required: []
preferred: []
networkTools:
image:
name: jupyterhub/k8s-network-tools
tag: "2.0.0"
pullPolicy:
pullSecrets: []
resources: {}
cloudMetadata:
enabled: true
networkPolicy:
enabled: false
ingress: []
egress: []
egressAllowRules:
cloudMetadataServer: true
dnsPortsPrivateIPs: true
nonPrivateIPs: true
privateIPs: false
interNamespaceAccessLabels: ignore
allowedIngressPorts: []
events: true
extraAnnotations: {}
extraLabels: {}
extraFiles: {}
extraEnv: {}
lifecycleHooks: {}
initContainers: []
extraContainers: []
allowPrivilegeEscalation: false
uid: 1000
fsGid: 100
serviceAccountName:
storage:
type: dynamic
extraLabels: {}
extraVolumes: []
extraVolumeMounts: []
static:
pvcName:
subPath: "{username}"
capacity: 10Gi
homeMountPath: /home/jovyan
dynamic:
storageClass:
pvcNameTemplate: claim-{username}{servername}
volumeNameTemplate: volume-{username}{servername}
storageAccessModes: [ReadWriteOnce]
image:
name: jupyterhub/k8s-singleuser-sample
tag: "2.0.0"
pullPolicy:
pullSecrets: []
startTimeout: 300
cpu:
limit:
guarantee:
memory:
limit:
guarantee:
extraResource:
limits: {}
guarantees: {}
cmd: jupyterhub-singleuser
defaultUrl:
extraPodConfig: {}
profileList: []
# scheduling relates to the user-scheduler pods and user-placeholder pods.
scheduling:
userScheduler:
enabled: true
revisionHistoryLimit:
replicas: 2
logLevel: 4
# plugins are configured on the user-scheduler to make us score how we
# schedule user pods in a way to help us schedule on the most busy node. By
# doing this, we help scale down more effectively. It isn't obvious how to
# enable/disable scoring plugins, and configure them, to accomplish this.
#
# plugins ref: https://kubernetes.io/docs/reference/scheduling/config/#scheduling-plugins-1
# migration ref: https://kubernetes.io/docs/reference/scheduling/config/#scheduler-configuration-migrations
#
plugins:
score:
# These scoring plugins are enabled by default according to
# https://kubernetes.io/docs/reference/scheduling/config/#scheduling-plugins
# 2022-02-22.
#
# Enabled with high priority:
# - NodeAffinity
# - InterPodAffinity
# - NodeResourcesFit
# - ImageLocality
# Remains enabled with low default priority:
# - TaintToleration
# - PodTopologySpread
# - VolumeBinding
# Disabled for scoring:
# - NodeResourcesBalancedAllocation
#
disabled:
# We disable these plugins (with regards to scoring) to not interfere
# or complicate our use of NodeResourcesFit.
- name: NodeResourcesBalancedAllocation
# Disable plugins to be allowed to enable them again with a different
# weight and avoid an error.
- name: NodeAffinity
- name: InterPodAffinity
- name: NodeResourcesFit
- name: ImageLocality
enabled:
- name: NodeAffinity
weight: 14631
- name: InterPodAffinity
weight: 1331
- name: NodeResourcesFit
weight: 121
- name: ImageLocality
weight: 11
pluginConfig:
# Here we declare that we should optimize pods to fit based on a
# MostAllocated strategy instead of the default LeastAllocated.
- name: NodeResourcesFit
args:
scoringStrategy:
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
type: MostAllocated
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
image:
# IMPORTANT: Bumping the minor version of this binary should go hand in
# hand with an inspection of the user-scheduelrs RBAC resources
# that we have forked in
# templates/scheduling/user-scheduler/rbac.yaml.
#
# Debugging advice:
#
# - Is configuration of kube-scheduler broken in
# templates/scheduling/user-scheduler/configmap.yaml?
#
# - Is the kube-scheduler binary's compatibility to work
# against a k8s api-server that is too new or too old?
#
# - You can update the GitHub workflow that runs tests to
# include "deploy/user-scheduler" in the k8s namespace report
# and reduce the user-scheduler deployments replicas to 1 in
# dev-config.yaml to get relevant logs from the user-scheduler
# pods. Inspect the "Kubernetes namespace report" action!
#
# - Typical failures are that kube-scheduler fails to search for
# resources via its "informers", and won't start trying to
# schedule pods before they succeed which may require
# additional RBAC permissions or that the k8s api-server is
# aware of the resources.
#
# - If "successfully acquired lease" can be seen in the logs, it
# is a good sign kube-scheduler is ready to schedule pods.
#
name: k8s.gcr.io/kube-scheduler
# tag is automatically bumped to new patch versions by the
# watch-dependencies.yaml workflow. The minor version is pinned in the
# workflow, and should be updated there if a minor version bump is done
# here.
#
tag: "v1.23.10" # ref: https://github.com/kubernetes/website/blob/ef84f694dc2b7fae58cf2da631a8dacecf6d5a94/content/en/releases/patch-releases.md
pullPolicy:
pullSecrets: []
nodeSelector: {}
tolerations: []
labels: {}
annotations: {}
pdb:
enabled: true
maxUnavailable: 1
minAvailable:
resources: {}
serviceAccount:
create: false
name:
annotations: {}
extraPodSpec: {}
podPriority:
enabled: false
globalDefault: false
defaultPriority: 0
imagePullerPriority: -5
userPlaceholderPriority: -10
userPlaceholder:
enabled: true
image:
name: k8s.gcr.io/pause
# tag is automatically bumped to new patch versions by the
# watch-dependencies.yaml workflow.
#
# If you update this, also update prePuller.pause.image.tag
#
tag: "3.8"
pullPolicy:
pullSecrets: []
revisionHistoryLimit:
replicas: 0
labels: {}
annotations: {}
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
resources: {}
corePods:
tolerations: []
nodeAffinity:
matchNodePurpose: prefer
userPods:
tolerations: []
nodeAffinity:
matchNodePurpose: ignore
# prePuller relates to the hook|continuous-image-puller DaemonsSets
prePuller:
revisionHistoryLimit:
labels: {}
annotations: {}
resources: {}
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
extraTolerations: []
# hook relates to the hook-image-awaiter Job and hook-image-puller DaemonSet
hook:
enabled: true
pullOnlyOnChanges: true
# image and the configuration below relates to the hook-image-awaiter Job
image:
name: jupyterhub/k8s-image-awaiter
tag: "2.0.0"
pullPolicy:
pullSecrets: []
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
podSchedulingWaitDuration: 10
nodeSelector: {}
tolerations: []
resources: {}
serviceAccount:
create: true
name:
annotations: {}
continuous:
enabled: true
pullProfileListImages: true
extraImages: {}
pause:
containerSecurityContext:
runAsUser: 65534 # nobody user
runAsGroup: 65534 # nobody group
allowPrivilegeEscalation: false
image:
name: k8s.gcr.io/pause
# tag is automatically bumped to new patch versions by the
# watch-dependencies.yaml workflow.
#
# If you update this, also update scheduling.userPlaceholder.image.tag
#
tag: "3.8"
pullPolicy:
pullSecrets: []
ingress:
enabled: false
annotations: {}
ingressClassName:
hosts: []
pathSuffix:
pathType: Prefix
tls: []
# cull relates to the jupyterhub-idle-culler service, responsible for evicting
# inactive singleuser pods.
#
# The configuration below, except for enabled, corresponds to command-line flags
# for jupyterhub-idle-culler as documented here:
# https://github.com/jupyterhub/jupyterhub-idle-culler#as-a-standalone-script
#
cull:
enabled: true
users: false # --cull-users
adminUsers: true # --cull-admin-users
removeNamedServers: false # --remove-named-servers
timeout: 3600 # --timeout
every: 600 # --cull-every
concurrency: 10 # --concurrency
maxAge: 0 # --max-age
debug:
enabled: false
global:
safeToShowValues: false
- overriding the singleUser settings as:
// import application configuration
(import 'main.libsonnet') + {
// override default configuration
_config+:: {
cluster: 'dev',
env: 'dev',
namespace: std.extVar('DEFAULT_NAMESPACE'),
//jupyterhub specific changes
jupyterhub+: {
values+: {
singleuser+: {
cloudMetadata: {blockWithIptables: false},
networkPolicy: {enabled: true, egressAllowRules: {cloudMetadataServer: true}},
extraLabels: {"hub.jupyter.org/network-access-hub": "true"},
profileList: [{
display_name: 'tensorflow-2.8.1-gpu-t4-1',
description: 'Spawns a notebook server with the tensorflow-2.8.1-gpu image.',
kubespawner_override: {
image: 'gcr.io/***/gpu-jupyter:v2',
cpu_guarantee: 2,
extra_resource_limits: { "nvidia.com/gpu": '1' },
mem_limit: '10G',
service_account: 'sa-dev-**-wi',
//extra_pod_config: {"dns_policy": "ClusterFirstWithHostNet"},
}
},{
display_name: "jupyter-datascience",
description: "Spawns a notebook server with the base jupyter/datascience-notebook image.",
kubespawner_override: {
image: 'jupyter/datascience-notebook:2343e33dec46',
cpu_guarantee: 2,
mem_limit: '10G',
node_selector: {"service": "canvas", 'blueGreen': 'green', 'cluster_name': 'ml-infra'},
service_account: 'sa-dev-canvas-wi',
},
},
// Note: Add more profiles here
// {}
],
},
proxy+: {
service+: {
annotations+: {
"meta.helm.sh/release-name": 'jupyterhub',
"meta.helm.sh/release-namespace": std.extVar('DEFAULT_NAMESPACE'),
"networking.gke.io/load-balancer-type": 'Internal',
},
},
}
},
},
},
}