Migrating user data between Hubs

When I first created my JupyterHub (following the z2jh tutorial), I put it in its own project on Google Cloud. I did this because I didn’t know what I was doing and didn’t want to break other systems while experimenting.

I would now like to migrate the original Hub (i.e. all user information and data) to a new Hub in a different Google Cloud project in order to allow better integration with other systems.

What are my options for this, please? On the original Hub, when users logged in for the first time, they were allocated 10 GB of “personal” storage (as a PVC) and then connected to it whenever they logged in again. I can create a new Hub in the new project on GCP, but when my users login they’ll be assigned new, empty PVCs.

I suppose I could ask each user to login to both Hubs, zip their files and transfer their data manually, but I’m hoping there’s a better way. For example, is there a way to transfer the old user database to the new Hub, and create PVCs etc. with the correct metadata so that I can transfer their data for them (and still have them correctly identified when they try to login)?

I guess others must have tackled this, so any advice regarding the workflow or things to watch out for would be appreciated.

Thanks!

In theory you can manually create PVCs for each user which should lead to a PV being dynamically created. You can then copy the data across. As long as the PVC matches that expected by Z2JH it should work.

Coincidentally someone recently posted about a data recovery situation:

It’s not the same since the underlying volumes already exist and they were trying to recreate the metadata, but it’s similar principles.

There are also tools such as

though I don’t have any experience with it.

1 Like

Thanks @manics, that’s very helpful!

So, if I’ve understood @yuvipanda’s post on the linked thread correctly, I can create PVCs on the new Hub for each existing user using something like:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: claim-<escaped-username>
  namespace: <your-namespace>
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: <size>
  storageClassName: standard
  volumeMode: Filesystem

and rely on Kubernetes’ “dynamic provisioning” to create the associated PVs automatically. Then, after I’ve transferred the data, existing users should be able to login and be correctly assigned to their new PVCs?

Have I understood correctly that, when a user logs in, the Hub “just” looks for a PVC in the same namespace called claim-<escaped-username> i.e. that’s the only metadata I need to preserve when I create the new PVCs? And I don’t need to worry about copying the old user database etc.? If so, that seems much easier than I was expecting, which would be great!

Thanks again for the reply :slight_smile:

Something like that!

For minimal risk I’d probably try it this way:

  1. Deploy your new Z2JH
  2. Login as yourself, this will cause Z2JH to generate a new PVC/PV in the usual way
  3. Compare the generated PVC YAML with your above template
  4. Create a PVC using the template for a trusted user who can test things for you
  5. Copy that user’s data across
  6. Ask that user to login
  7. Check they can see their data!
1 Like

That sounds like an excellent suggestion - thanks @manics!