Thanks to all the useful tips on optimizing for scaling.
However, I’m still running into trouble scaling to a few hundred users, with “Service unavailable” fairly often on startup (requiring a few browswer refreshes), and a recent hang for ~250 users using RStudio.
I’m now thinking about what I need to try next for optimizing. How much difference does it make to offload the logs to PostgreSQL, rather than the default sqlite? In particular, how important is it to offload the logs to a separate node? Do y’all find the sqlite process uses a lot of memory / CPU?
The logs shouldn’t be that heavy weight.
The first thing I’d check is to see what resource is being consumed. In particular, I would check if the machine has enough RAM.
The thing about logging with postgresql is that it’s not that much less resource intensive than sqlite. However it allow you to offload the activity to another machine. If it turns out that you have a machine that’s already overloaded, it’s not likely to help much.
What exactly do you mean with “logs”? How did you measure the performance and of what? Or asked differently what is it that is “too slow”?
We run all the JupyterHubs used as part of mybinder.org on sqlite. Depending on the day/load they handle up to 400 concurrent users.
In the last few months there have been a few issues and PRs with performance benchmarks and improvements. I think most (if not all) used postgresql and where in the context of having thousands of (active) users. I think Hub startup and the admin GUI were identified as things that were taking a lot of time.
Thanks for these. So - does anyone have any data to suggest that Postgres works better for scaling, or on some other performance metric?
I have seen posts about idle culling and gathering metrics too often causing slow downs. Could it be one of those?