Scaling - do I need PostgreSQL?

matthew.brett · October 6, 2020, 11:26am

Thanks to all the useful tips on optimizing for scaling.

However, I’m still running into trouble scaling to a few hundred users, with “Service unavailable” fairly often on startup (requiring a few browswer refreshes), and a recent hang for ~250 users using RStudio.

I’m now thinking about what I need to try next for optimizing. How much difference does it make to offload the logs to PostgreSQL, rather than the default sqlite? In particular, how important is it to offload the logs to a separate node? Do y’all find the sqlite process uses a lot of memory / CPU?

Joseph_Wang · October 11, 2020, 6:13am

The logs shouldn’t be that heavy weight.

The first thing I’d check is to see what resource is being consumed. In particular, I would check if the machine has enough RAM.

Joseph_Wang · October 11, 2020, 6:16am

The thing about logging with postgresql is that it’s not that much less resource intensive than sqlite. However it allow you to offload the activity to another machine. If it turns out that you have a machine that’s already overloaded, it’s not likely to help much.

betatim · October 11, 2020, 2:07pm

What exactly do you mean with “logs”? How did you measure the performance and of what? Or asked differently what is it that is “too slow”?

We run all the JupyterHubs used as part of mybinder.org on sqlite. Depending on the day/load they handle up to 400 concurrent users.

In the last few months there have been a few issues and PRs with performance benchmarks and improvements. I think most (if not all) used postgresql and where in the context of having thousands of (active) users. I think Hub startup and the admin GUI were identified as things that were taking a lot of time.

matthew.brett · October 12, 2020, 6:49am

Thanks for these. So - does anyone have any data to suggest that Postgres works better for scaling, or on some other performance metric?

markperri · November 7, 2020, 10:11pm

I have seen posts about idle culling and gathering metrics too often causing slow downs. Could it be one of those?

Topic		Replies	Views
JupyterHub docker-deploy DB JupyterHub	3	487	October 10, 2023
JupyterHub on different server platforms JupyterHub	2	369	August 19, 2021
Scheduler "insufficient memory.; waiting" errors - any suggestions? JupyterHub	7	1958	August 28, 2020
JupyterHub Database change JupyterHub jupyterhub	3	815	September 19, 2022
JupyterHub Crashes under high load JupyterHub	3	761	October 5, 2019

Scaling - do I need PostgreSQL?

Related topics