how to change default db from sqlite to postgresql/mysql in jupyter notebook

Srinidhi_G_S · December 7, 2020, 11:12pm

I can see there are some config options in jupyter_notebook_config.py related to default database which is sqlite

## The sqlite file in which to store notebook signatures. By default, this will
#  be in your Jupyter data directory. You can set it to ':memory:' to disable
#  sqlite writing to the filesystem.
#c.NotebookNotary.db_file = ''


## A callable returning the storage backend for notebook signatures. The default
#  uses an SQLite database.
#c.NotebookNotary.store_factory = traitlets.Undefined

Now I want to change this default db to some relational db like postgresql/mysql as we are frequently getting database locked error. currently this platform is used by multiple users for python practice. we spawn separate notebook for each user.

How can we specify the mysql/postgresql database in the configuration of jupyter notebook.

kevin-bates · December 8, 2020, 4:38pm

Hi @Srinidhi_G_S - welcome to the Jupyter community!

To introduce a different signature store, it looks like you’d need to provide an implementation of a SignatureStore class that can initialize and operate against a PG or MySQL DB. You would then configure c.NotebookNotary.store_factory to point at this class. I don’t know if such an implementation exists.

It seems like you could derive much of the implementation from SQLiteSignatureStore since SQL is, well, fairly standard. The issue with doing something like that is in case there are any checks like isinstance(store_factory, SQLiteSignatureStore) (which is arguably a bug) the code may act upon your implementation in an incompatible way.

Instead, a helpful and simple refactoring would be to introduce a SQL base class that contains the CRUD operations, leaving the subclasses to implement essentially just the initialization logic. Then, your PG or MySQL implementation would derive from that SQL base class. Classes that require some variation on the CRUD basics would just override those methods as necessary or have the base “ask” each class to provide “its” appropriate SQL command, etc.

If you get this implemented, you should check in with the nbformat maintainers to see if your implementation could be contributed back to the community.

Good luck,
Kevin.

Girish_Kumar_Gupta · December 17, 2020, 6:25pm

Thanks kevin-bates,

we have fixed the same issue using memory option. we have not seen any issue so far.

But I have few queries?

Do we create different entry in nbsignatures whenever we save the notebook for each user or there is only one entry for one user for one notebook.
Is there any potential issue that may arise if we use memory instead of db?

Regards
Girish

kevin-bates · December 22, 2020, 5:15pm

Hi @Girish_Kumar_Gupta.

Do we create different entry in nbsignatures whenever we save the notebook for each user or there is only one entry for one user for one notebook.

I am not familiar with exactly how signatures work wrt notebooks - my previous response was formed by looking at the code. However, I can tell you that the Jupyter framework (with the exception of Hub and perhaps one or two other projects) is inherently single-user and it has no notion of differentiating actions at the user level without additional modifications.

Is there any potential issue that may arise if we use memory instead of db?

It appears that the SignatureStore implementations perform culling operations once the cache_size (default 65536) has been exceeded, so your memory consumption relative to nb signatures should not exceed 12MB (based on this comment).

The cache size for MemorySignatureStore is not configurable, while for SQLiteSignatureStore it is.

chance2021 · June 14, 2021, 11:19am

Hi, have you figured out how to change the database in jupyter notebook? I know there is a way to change the dababase in hub ( hub.db.type in config.yaml) but not sure how to change it in the notebook. Any feedback will be much appreciated!

chance2021 · June 17, 2021, 11:32am

Actually I found a workaround for this issue. The issue is caused by the sqlite db is not compatible with NFS drive. One way is to replace the database from sqlite to postgre for the singleuser notebook but I haven’t figured it out how to do that (btw, you can point the hub database to postgres, which is suggested by the official doc, by adding to hub.db.type and hub.db.url.). The other way, which is the workaround I am using, is to relocate the nbsignature.db file to your k8s cluster local disk. This can be done by modifying the configuration files inside of the jhub image. The below are the steps for this.

Update below command in both /etc/jupyter/jupyter_notebook_config.py and /home/jovyan/.jupyter/jupyter_notebook_config.py in the docker image

c.NotebookNotary.data_dir = “/tmp/signature_dir”

Note: By default, in the deployment.yaml in the helm package, only the files under /home and /share directories are stored via PVC, which is NFS in my case. Whatever files beyond this scope will be stored in the local disk during the lifetime of the pod.

For this signature db file, given the size is relatively small and the nature that it is only for the duration of a single session, I think it should be fine to just store it in the local disk, instead of the postgres database.

Hopefully it will be helpful for anyone has the same issue as me.

Reference:

https://jupyter-notebook.readthedocs.io/en/stable/security.html#notebook-security

Topic		Replies	Views
Using MySQL or Microsoft SQL Server to store user authentication JupyterHub how-to	1	492	March 22, 2023
How to migrate from sqlite to psql db Zero to JupyterHub on Kubernetes jupyterhub , help-wanted	1	23	October 22, 2024
PostgresSQL backend Jupyterhub DB JupyterHub	2	527	January 7, 2022
JupyterHub docker-deploy DB JupyterHub	3	456	October 10, 2023
Hub.db.type usage Zero to JupyterHub on Kubernetes	2	437	November 22, 2023

how to change default db from sqlite to postgresql/mysql in jupyter notebook

Related topics