When I first created my JupyterHub (following the z2jh tutorial), I put it in its own project on Google Cloud. I did this because I didn’t know what I was doing and didn’t want to break other systems while experimenting.
I would now like to migrate the original Hub (i.e. all user information and data) to a new Hub in a different Google Cloud project in order to allow better integration with other systems.
What are my options for this, please? On the original Hub, when users logged in for the first time, they were allocated 10 GB of “personal” storage (as a PVC) and then connected to it whenever they logged in again. I can create a new Hub in the new project on GCP, but when my users login they’ll be assigned new, empty PVCs.
I suppose I could ask each user to login to both Hubs, zip their files and transfer their data manually, but I’m hoping there’s a better way. For example, is there a way to transfer the old user database to the new Hub, and create PVCs etc. with the correct metadata so that I can transfer their data for them (and still have them correctly identified when they try to login)?
I guess others must have tackled this, so any advice regarding the workflow or things to watch out for would be appreciated.
In theory you can manually create PVCs for each user which should lead to a PV being dynamically created. You can then copy the data across. As long as the PVC matches that expected by Z2JH it should work.
Coincidentally someone recently posted about a data recovery situation:
It’s not the same since the underlying volumes already exist and they were trying to recreate the metadata, but it’s similar principles.
and rely on Kubernetes’ “dynamic provisioning” to create the associated PVs automatically. Then, after I’ve transferred the data, existing users should be able to login and be correctly assigned to their new PVCs?
Have I understood correctly that, when a user logs in, the Hub “just” looks for a PVC in the same namespace called claim-<escaped-username> i.e. that’s the only metadata I need to preserve when I create the new PVCs? And I don’t need to worry about copying the old user database etc.? If so, that seems much easier than I was expecting, which would be great!
This question originally does not mention the hub.db data, I am wondering, if I use sqlite as my db, and want to move the hub application from one cluster to another in a different environment (for example from a local k8s cluster to a GKE), how should I migrate my db data to the new cluster?
By default, new pvc and pv will be created dynamically when the new application is installed. Can I set hub.db parameters to bind this database to existing pvc and pv, or I can only move data into the new pv after it is created?
We followed the steps outlined by @manics above and everything went smoothly. In other words, we didn’t migrate the old Hub database at all - just transferred the persistent user data to the new cluster and allowed the new Hub to build itself a new database as users logged in.
The most fiddly part for us was creating new PVCs with the correct names. This is because our old Hub used GitHub OAuth for authentication, whereas the new one uses Azure AD. We therefore needed to figure out a mapping between the “sanitised” GitHub user names from the old Hub and the “sanitised” Azure user names on the new one (because otherwise users would be assigned a new, empty PVC at first sign in, rather than being linked to their old data). This was actually pretty easy - it just took a bit of experimentation and there were a few “gotchas” where users with unusual names/e-mail addresses were not correctly assigned first time.
We only have ~100 users on our Hub, so I just created a CSV mapping old PVC names to new ones and we wrote a script to migrate the user data.
@JES Thanks for quick reply.
Have you involved in authentication management problems for team work? In our case, we would like to have collaboration works mentioned here, thus I think we need to keep the data in groups, roles, a few other mapping tables.
Sorry, I don’t have any experience with the real-time collaboration features, although they look interesting! But, yes, it looks like it maybe makes things a bit more complicated in terms of user groups, roles etc.