JupyterHub docker-deploy DB

Back in the days @minrk was contributing/maintaining the jupyter-deploy-docker repo, the docker-compose setup used a Postgres DB container:

Then @manics simplify that, and the DB in compose was gone:

I cloned and started using @minrk version, with the DB-container, at somepoint in 2021. It is still doing its humble and important work in my little lab. But I am now reviewing the whole environment and noticed the difference above. And I would like to understand what and why happened so I can understand what should I change in my setup.

So, I have a few questions (the same question from different perspectives, really):

  1. what was the use of Postgres DB back then that is not being used anymore (in the exemplary docker-compose)?
  2. was the “DB” removed because current JupyterHub doesn’t need it anymore, or just because you decided the docker-compose should just look/provide a simpler setup?
  3. Should I keep using the compose with Postgres container or not (the new/current one)?

Thanks!

More generic answer:

JupyterHub needs a DB to keep the state of the hub. But it can support different types of DB backends like Postgres, MySQL, SQLite to name a few. The change you are referring to is when docker compose files were changed from using Postgres as DB backend to a “simpler” SQLite backend. I will let @manics @minrk to answer why it was changed. But I assume it must be to make the deployment simpler.

  1. Should I keep using the compose with Postgres container or not (the new/current one)?

This depends on your use case. If I am right, SQLite does not support concurrent writes. So if you have so many concurrent users, it might be a bit laggy. On the other hand Postgres is a more production ready DB with more advanced features. There is a really nice writeup on JupyterHub DB in the docs. That should answer your questions!

1 Like

I simplified it because the original example was very complicated, and ultimately unmaintained. There’s no harm using PostgreSQL but it’s probably not necessary since this is a single-node JupyterHub system, so performance will be limited by the available resources for users rather than the performance of the database.

1 Like

Thank you both for the clarification. @manics , could I bother you a bit more and ask where do I learn about the information that goes to the database? Is there a code/schema/document I can study to have a close understanding of the data in/out the DB?