JupyterHub docker-deploy DB

chbrandt · October 9, 2023, 2:35pm

Back in the days @minrk was contributing/maintaining the jupyter-deploy-docker repo, the docker-compose setup used a Postgres DB container:

GitHub - jupyterhub/jupyterhub-deploy-docker at 04e190069149f985dcd67de9e713f1f4b1691d2a

Then @manics simplify that, and the DB in compose was gone:

GitHub - jupyterhub/jupyterhub-deploy-docker at 7c373f9b74b143f9140cc34949219cdee128eec7

I cloned and started using @minrk version, with the DB-container, at somepoint in 2021. It is still doing its humble and important work in my little lab. But I am now reviewing the whole environment and noticed the difference above. And I would like to understand what and why happened so I can understand what should I change in my setup.

So, I have a few questions (the same question from different perspectives, really):

what was the use of Postgres DB back then that is not being used anymore (in the exemplary docker-compose)?
was the “DB” removed because current JupyterHub doesn’t need it anymore, or just because you decided the docker-compose should just look/provide a simpler setup?
Should I keep using the compose with Postgres container or not (the new/current one)?

Thanks!

mahendrapaipuri · October 9, 2023, 2:58pm

More generic answer:

JupyterHub needs a DB to keep the state of the hub. But it can support different types of DB backends like Postgres, MySQL, SQLite to name a few. The change you are referring to is when docker compose files were changed from using Postgres as DB backend to a “simpler” SQLite backend. I will let @manics @minrk to answer why it was changed. But I assume it must be to make the deployment simpler.

Should I keep using the compose with Postgres container or not (the new/current one)?

This depends on your use case. If I am right, SQLite does not support concurrent writes. So if you have so many concurrent users, it might be a bit laggy. On the other hand Postgres is a more production ready DB with more advanced features. There is a really nice writeup on JupyterHub DB in the docs. That should answer your questions!

manics · October 9, 2023, 10:36pm

I simplified it because the original example was very complicated, and ultimately unmaintained. There’s no harm using PostgreSQL but it’s probably not necessary since this is a single-node JupyterHub system, so performance will be limited by the available resources for users rather than the performance of the database.

chbrandt · October 10, 2023, 8:33am

Thank you both for the clarification. @manics , could I bother you a bit more and ask where do I learn about the information that goes to the database? Is there a code/schema/document I can study to have a close understanding of the data in/out the DB?

Topic		Replies	Views
Hub.db.type usage Zero to JupyterHub on Kubernetes	2	529	November 22, 2023
PostgresSQL backend Jupyterhub DB JupyterHub	2	607	January 7, 2022
JupyterHub Database change JupyterHub jupyterhub	3	815	September 19, 2022
User on hub database are not being persisted after hub pod restart JupyterHub jupyterhub , help-wanted	5	109	August 12, 2024
Resume job with swarmspawner on multi nodes JupyterHub how-to	6	393	December 15, 2022

JupyterHub docker-deploy DB

Related topics