This Session's transaction has been rolled back -- JupyterHub 500 Errors

Hi,

We have JupyterHub 1.2.1 hosted on EKS with AWS Aurora Postgres Serverless as our hub DB. We have been seeing errors a lot like this recently:

[SQL: SELECT oauth_clients.id AS oauth_clients_id, oauth_clients.identifier AS oauth_clients_identifier, oauth_clients.description AS oauth_clients_description, oauth_clients.secret AS oauth_clients_secret, oauth_clients.redirect_uri AS oauth_clients_redirect_uri 
FROM oauth_clients 
WHERE oauth_clients.identifier = %(param_1)s]
[parameters: {'param_1': 'jupyterhub-user-pa190698159'}]
(Background on this error at: http://sqlalche.me/e/13/e3q8)

[E 2021-03-18 16:13:34.232 JupyterHub ioloop:761] Exception in callback functools.partial(<bound method IOLoop._discard_future_result of <tornado.platform.asyncio.AsyncIOMainLoop object at 0x7fb029e8fac0>>, <Task finished name=‘Task-160124’ coro=<JupyterHub.update_last_activity() done, defined at /usr/local/lib/python3.8/dist-packages/jupyterhub/app.py:2519> exception=InvalidRequestError(“This Session’s transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (psycopg2.OperationalError) SSL connection has been closed unexpectedly\n\n[SQL: SELECT oauth_clients.id AS oauth_clients_id, oauth_clients.identifier AS oauth_clients_identifier, oauth_clients.description AS oauth_clients_description, oauth_clients.secret AS oauth_clients_secret, oauth_clients.redirect_uri AS oauth_clients_redirect_uri \nFROM oauth_clients \nWHERE oauth_clients.identifier = %(param_1)s]\n[parameters: {‘param_1’: ‘jupyterhub-user-pa190698159’}]\n(Background on this error at: http://sqlalche.me/e/13/e3q8)”)>)
Traceback (most recent call last):
File “/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py”, line 741, in _run_callback
ret = callback()
File “/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py”, line 765, in _discard_future_result
future.result()
File “/usr/local/lib/python3.8/dist-packages/jupyterhub/app.py”, line 2536, in update_last_activity
user = orm.User.find(self.db, route_data[‘user’])
File “/usr/local/lib/python3.8/dist-packages/jupyterhub/orm.py”, line 222, in find
return db.query(cls).filter(cls.name == name).first()
File “/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/query.py”, line 3429, in first
ret = list(self[0:1])
File “/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/query.py”, line 3203, in getitem
return list(res)
File “/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/query.py”, line 3535, in iter
return self._execute_and_instances(context)
File “/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/query.py”, line 3556, in _execute_and_instances
conn = self._get_bind_args(
File “/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/query.py”, line 3571, in _get_bind_args
return fn(
File “/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/query.py”, line 3550, in _connection_from_session
conn = self.session.connection(**kw)
File “/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py”, line 1138, in connection
return self._connection_for_bind(
File “/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py”, line 1146, in _connection_for_bind
return self.transaction._connection_for_bind(
File “/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py”, line 409, in _connection_for_bind
self._assert_active()
File “/usr/local/lib/python3.8/dist-packages/sqlalchemy/orm/session.py”, line 289, in _assert_active
raise sa_exc.InvalidRequestError(
sqlalchemy.exc.InvalidRequestError: This Session’s transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (psycopg2.OperationalError) SSL connection has been closed unexpectedly

When this happens and we go to the login screen for JupyterHub, we get a 500 Internal Server error. If I refresh the page, it redirects me back to the login, then I log in and the problem resolves itself. Also, if we restart the JupyterHub pod, the problem resolves itself. We have recently enabled the liveness probe so that a manual restart will hopefully not be required, but we are not sure why the error keeps coming back (every day for the last week).

I saw related threads about this issue (https://github.com/jupyterhub/jupyterhub/issues/1626), and I was wondering if we need to explicitly try/catch, and rollback any specific transactions. Is there any guidance on why this might be happening and how to fix this recurring issue?

Thanks!