JupyterHub Database change

We need to switch postgresql server that is configured to jupyterhub. If we don’t care about the user states, do we still need to backup from old server and restore in the new server or simply create a new db in the new server and point jupyterhub to it?

Do the database store critical states that would require us to backup from old server and restore in new server? If we don’t restore the db to new server, will existing jupyterhub instance stop functioning?

All of the state in the JupyterHub db relates to authentication and running servers. If you don’t have any running servers when you upgrade, and you don’t mind all the users being deleted, there should be no issue.

The kind of information that is lost:

  • information about running servers (none if none are running)
  • Some spawners may persist data in state while they are not running, such as persistent volume information. This is more often derived deterministically from the username, though.
  • tokens issued not via config static config, e.g. via the API or /hub/tokens page, or oauth to services
  • users created or modified via the API or admin panel
  • users who have successfully logged in if not using allowed_users
  • activity data

The rest is generally created from config.

I don’t know enough about the relationship of the old to the new in your situation. If they don’t share a database, there isn’t really any state to share so they should both work concurrently (of course, they can’t both be accessible at the same URL at the same time). Depending on overlap of your configuration, there may be problems if they are both running at the same time.

Thanks! We are deploying in azure kubernetes service. I believe the persistent volume is determined from the username. I am guessing it is not persisted in the db. Please confirm
We don’t issue any token and don’t create or modify user via API.
Since Running servers, successfully logged in user and activity data are runtime information, I guess loosing them won’t be an issue

The plan is that we bring up a new postgresql server and db and point JupyterHub to it.

If there are any issue and we need to rollback, we redeploy JupyterHub pointing to the old postgresql db.

With this approach there is no need to backup and restore database. Let me know if I missed anything.

I think that’s accurate. I assume you’ll take down the old Hub and all user servers before starting the new one, otherwise, it won’t be consistent.