I cloned and started using @minrk version, with the DB-container, at somepoint in 2021. It is still doing its humble and important work in my little lab. But I am now reviewing the whole environment and noticed the difference above. And I would like to understand what and why happened so I can understand what should I change in my setup.
So, I have a few questions (the same question from different perspectives, really):
what was the use of Postgres DB back then that is not being used anymore (in the exemplary docker-compose)?
was the “DB” removed because current JupyterHub doesn’t need it anymore, or just because you decided the docker-compose should just look/provide a simpler setup?
Should I keep using the compose with Postgres container or not (the new/current one)?
JupyterHub needs a DB to keep the state of the hub. But it can support different types of DB backends like Postgres, MySQL, SQLite to name a few. The change you are referring to is when docker compose files were changed from using Postgres as DB backend to a “simpler” SQLite backend. I will let @manics@minrk to answer why it was changed. But I assume it must be to make the deployment simpler.
Should I keep using the compose with Postgres container or not (the new/current one)?
This depends on your use case. If I am right, SQLite does not support concurrent writes. So if you have so many concurrent users, it might be a bit laggy. On the other hand Postgres is a more production ready DB with more advanced features. There is a really nice writeup on JupyterHub DB in the docs. That should answer your questions!
I simplified it because the original example was very complicated, and ultimately unmaintained. There’s no harm using PostgreSQL but it’s probably not necessary since this is a single-node JupyterHub system, so performance will be limited by the available resources for users rather than the performance of the database.
Thank you both for the clarification. @manics , could I bother you a bit more and ask where do I learn about the information that goes to the database? Is there a code/schema/document I can study to have a close understanding of the data in/out the DB?