The JupyterHub use one single session instance through the app life cycle. This makes sense for the synchronized application but JupyterHub is backed with Tornado which is asynchronized.
For what I know in Tornado, it’s better to create a session when a request come in and close it when that request is finished. Thus requests keep their own session instance and never messed up with each other.
There is a tiny scripts which reproduce these thoughts:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from jupyterhub import orm
engine = create_engine('postgresql+psycopg2://xxxx:xxxx@localhost:5432/hub_test')
Session = sessionmaker(bind=engine)
async def get(self):
delay = self.get_argument('sleep', 0)
name = self.get_argument('name')
new_user = orm.User(name=name)
# executing some heavy task
# record user to be added in this request
new_instances = self.db.new
list(map(lambda u: u.name, new_instances))
settings = dict(
if __name__ == "__main__":
app = make_app()
After start it up, execute the following commands in sequence:
curl "http://localhost:8888/?name=JoJo1&sleep=10" # output  after 30 after 10 seconds
# after 2 seconds
curl "http://localhost:8888/?name=JoJo2&sleep=1" # output ["JoJo1", "JoJo2"] after 1 seconds
There is the mess occured, the
JoJo2 is been added by the second request and the first request added nothing.
Not sure if it’s a problem for JupyterHub, can anybody explain it? Thanks.
Anybody know about this question?
Great question! There’s been work related to this in the past. These PRs and issues linked from them might be interesting to you:
We’ve ended up on a somewhat stable state at the moment, but always useful to keep digging. Hope this is useful!
I’m curious if anyone has ever tried using tornado-sqlalchemy with jupyterhub? I’ve been reading through this old issue because we’re experiencing scaling problems with the hub as well and it seems to be rooted in the database ORM layer and how there is a single session for all requests. I thought maybe our
max_connections setting for the postgresql database was too low (115) but when looking at actual connections to the hub DB in our production cluster there are only 5 because that’s the default in sqlalchemy QueuePool. Or am I misunderstanding and people with larger hubs are commonly passing through a larger
pool_size using the db_kwargs config? For an idea of target numbers we’re trying to be able to support up to 2000 active users/notebooks (pods) on our hub at the same time. So far we’ve had up to 800 after changing c.JupyterHub.activity_resolution and I’m doing stress testing to see how much beyond that we can get before the hub starts to crash.
This single session is used as an optimization (local caches means fewer repeated queries necessary for already-fetched state), but is also one of the biggest impediments to multiple-instances needed for HA deployments. To use per-handler Sessions, we would need to eliminate long-lived database-referencing objects (mainly User and Spawner). That’s a big change! It would also make certain operations noticeably more expensive, since it would mean re-fetching a lot of data from the database that we currently tend to have in memory. That’s already a requirement if we ever have another process writing to the database invalidating our cache.
Since our use of async/await means “cooperative” multitasking, the main thing we really have to do to ensure consistent database state is to never do this:
# executing some heavy task
that is, have an
await between preparing a transaction and committing it. As long as we always commit any changes before we await and let another handler start working with the database, we stay consistent.
Just to follow up on this, I did some digging into the max DB connections used in our production Hub database and in the last two weeks it hasn’t gone over 7. The default connection pool size in sqlalchemy is 5 with a default max_overflow of 10 so we have at most 15 connections to give out from the pool before connection requests start waiting. Since we’re not near that 15 connection limit increasing the
pool_size likely wouldn’t help us.
Our last big event was at the beginning of May and we have another one coming up next week so I guess we’ll see if those numbers change but so far it seems as though the defaults are sufficient for us.