How to link GCP VM disks allocated by dynamic pvc-HEXHEXHEX and `claim-username` again after irrevocably losing all kubernetes yaml?

TL;DR I destroyed my claim-usernames on a prod cluster deployed using current z2jh, and have a mob of nervous scientists scared they lost their data… how do I go about rebinding a hundred disk-HEXHEX GCP VM disks to the appropriate jupyterhub users?

  1. In the process of finally deploying / converting our last cluster (prod, naturally, almost a half decade old and filled with custom hacks) to be fully terraform created, I accidentally destroyed our claim-username and pvc-HEXHEX yaml objects, and their snapshots.

  2. I still have the data in GCP VM Disks, each user having a separate gcp disk for their homedir.

  3. As far as I can tell, kubespawner creates the claim-username objects, and then the pvc-HEXHEXHEX are dynamically created by GKE (or any k8s storage provider), binding them to a GCP VM volume.

  4. I can modify the claim-username objects with a fork of kubespawner or z2jh hackery, but they of course aren’t where the disk reference itself is stored, the only ref to the disk is in the pvc-HEXHEXHEX objects created in fulfilling the claim.

  5. Of course, the disks have their metadata labels, including the org.jupyterhub/username field, so its not hard to find the right disk for each user… I just can’t figure out how to recreate dynamic claims that point at existing disks :face_with_raised_eyebrow:

Any help or “this is where I might start looking” appreciated even if this isn’t your expertise, because I haven’t found as much time to sleep as I’d like, I’m so happy the prod cluster is back, but I’m too dopey right now to figure out new complex k8s things… This has been a rough weekend with 3 days downtime on our prod juphub cluster (our longest in a few years :heart: kubespawner + z2jh) at a pretty bad moment.

Because this is a prod emergency, I’ve decided to be annoying and dual post here and on gitter. Appreciate your tolerance :bowing_man: If you do have any ideas or are just k8s knowledgable and would be willing to help point me live for 30 minutes on gitter, I would really appreciate another pair of eyes at this moment.

Aloha,
-Seth

seth@ceresimaging.net, Principal Engineer @ Ceres Imaging (http://ceresimaging.net), 808-212-3349

1 Like

Hey! First, hugops! This sounds like a stressful situation, and I hope you are able to get past this soon. Terraform destroying things it shouldn’t is No. 1 in my nightmare scenarios.

Here’s how you can go about recreating the correct PVCs:

  1. Create a PV for each of the disks. These should look something like:
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pvc-<some-unique-name>
  labels:
    failure-domain.beta.kubernetes.io/region: us-central1
    failure-domain.beta.kubernetes.io/zone: us-central1-b
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: <capacity>
  gcePersistentDisk:
    fsType: ext4
    pdName: <name-of-the-gcp-pd-disk>
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: failure-domain.beta.kubernetes.io/zone
          operator: In
          values:
          - <zone>
        - key: failure-domain.beta.kubernetes.io/region
          operator: In
          values:
          - <region>
  persistentVolumeReclaimPolicy: Delete
  storageClassName: standard
  volumeMode: Filesystem
  1. A PVC object, of the form:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: claim-<escaped-username>
  namespace: <your-namespace>
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: <size>
  storageClassName: standard
  volumeMode: Filesystem
  volumeName: <name-of-pv-object-you-created-earlier>

I think this should do. Persistent Volumes | Kubernetes should give you some more information on this process. By default, z2jh uses Kubernetes’ dynamic provisioning feature to auto create the PVs. In this case, you’re manually creating them.

The PVCs need to be named as claim-<escaped-username>. I don’t know if the label on the disk already contains the escaped username. If not, you can generate it with this logic: kubespawner/spawner.py at 781a9c21d04d845582f0b595b06d14913de58f35 · jupyterhub/kubespawner · GitHub

2 Likes

This also almost happened to me last week. Put in place some safeguards to prevent that from happening again: Additional safeguards against accidental cluster deletion by yuvipanda · Pull Request #1069 · 2i2c-org/infrastructure · GitHub

I really appreciate your response, I’m doing a hu-man dinner hour with my wife, but I will read closely after!

Our last serious prod down was unrelated to z2jh but totally related to terraform (our single s3 bucket backed high bandwidth scales to many TB “hot dir sync” service for z2jh, aka hotflights, which btw if other folks might benefit from I think would be fine to share on gh), but totally related to another terraform apply --make-me-bleed :wink: The risk with canons is very real in my hide’s experience… but they sure are nice to have in place when you get sneak attacked by some other part of your stack :bomb:

What are your thoughts on using a shared file server for <20 high IO bandwidth (models, training) and <100 relatively low IO bandwidth users? I was thinking this might be the moment to switch to my preferred architecture, an NFS export from a high speed GCP filestore, and just get out of the pre-guessing my user disk size needs, etc. We don’t particularly need permissions, and being able to copy things between users homedirs would aid us in debugging, and many of our users would use it while collaborating with one another.

This experience almost has me feeling like tiny-fixed-size-dynamic-disk-mounts is an antipattern at moderate scale, and only makes sense at very low scale (no setup required) and very high scale (no shared IO bottlenecks, greater H/A, tho I believe GCP filestore has a reliable HA option too)

I wonder if there’s a future where its a single bit flip in z2jh to allocate an NFS server as part of the cluster for the cases between “getting started” and “HA at scale” which might typify a lot of heavily adminned installs on here… Would be very curious for your thoughts @yuvipanda, I’ve wanted to do a significant feature contribution back to z2jh to thank ya’ll for all the lift you’ve done, and this would be an area I’m relatively experienced in.

-Seth

This is exactly right, and I pretty much switched to using a shared home directory space for all my clusters a few years ago. I do recommend that pattern, and as you said, maybe now is the time to switch.

Any chance you could share one of your setup’s configs or link me to a tutorial or example in github you’d recommend following? I kind of have a moment here where I can make storage changes that will be particularly unwelcome in the coming year wiht the memory of this event… I’d like to get a KISS but future smart storage layer in place as a concillation prize thru this process.

Check out GitHub - 2i2c-org/infrastructure: Infrastructure for configuring and deploying our community JupyterHubs.. Terraform for that is here: infrastructure/storage.tf at 1679b331862defb59d64322c86da986dceed2df3 · 2i2c-org/infrastructure · GitHub. Then for each hub, you have to create a single PVC that references your server IP: infrastructure/nfs.yaml at 1679b331862defb59d64322c86da986dceed2df3 · 2i2c-org/infrastructure · GitHub. Then you can configure singleuser like this: infrastructure/values.yaml at 1679b331862defb59d64322c86da986dceed2df3 · 2i2c-org/infrastructure · GitHub.

There’s also more information in Add information about using NFS with z2jh · Issue #421 · jupyterhub/zero-to-jupyterhub-k8s · GitHub

1 Like

Thanks so much for your help @yuvipanda, following your pattern above I was able to switch us successfully amidst the crisis feeling to a fully terraformed shared homedir setup backed by an NFS-exporting GCP VM Filestore, and with my teammates help (:bowing_man:) we got everyone their data back :call_me_hand:t4:

Still getting a few auxiliary services back in operation, but at least we have core juphub+gpus on all clusters going again!

1 Like

Awesome, @Seth_Nickell! Would it be possible to post any scripts or other commands that were helpful to you in the process? You’d definitely not be the last person to accidentally destroy their cluster…