how to change default db from sqlite to postgresql/mysql in jupyter notebook

I can see there are some config options in jupyter_notebook_config.py related to default database which is sqlite

## The sqlite file in which to store notebook signatures. By default, this will
#  be in your Jupyter data directory. You can set it to ':memory:' to disable
#  sqlite writing to the filesystem.
#c.NotebookNotary.db_file = ''


## A callable returning the storage backend for notebook signatures. The default
#  uses an SQLite database.
#c.NotebookNotary.store_factory = traitlets.Undefined

Now I want to change this default db to some relational db like postgresql/mysql as we are frequently getting database locked error. currently this platform is used by multiple users for python practice. we spawn separate notebook for each user.

How can we specify the mysql/postgresql database in the configuration of jupyter notebook.

Hi @Srinidhi_G_S - welcome to the Jupyter community!

To introduce a different signature store, it looks like you’d need to provide an implementation of a SignatureStore class that can initialize and operate against a PG or MySQL DB. You would then configure c.NotebookNotary.store_factory to point at this class. I don’t know if such an implementation exists.

It seems like you could derive much of the implementation from SQLiteSignatureStore since SQL is, well, fairly standard. The issue with doing something like that is in case there are any checks like isinstance(store_factory, SQLiteSignatureStore) (which is arguably a bug) the code may act upon your implementation in an incompatible way.

Instead, a helpful and simple refactoring would be to introduce a SQL base class that contains the CRUD operations, leaving the subclasses to implement essentially just the initialization logic. Then, your PG or MySQL implementation would derive from that SQL base class. Classes that require some variation on the CRUD basics would just override those methods as necessary or have the base “ask” each class to provide “its” appropriate SQL command, etc.

If you get this implemented, you should check in with the nbformat maintainers to see if your implementation could be contributed back to the community.

Good luck,
Kevin.

1 Like

Thanks kevin-bates,

we have fixed the same issue using memory option. we have not seen any issue so far.

But I have few queries?

  1. Do we create different entry in nbsignatures whenever we save the notebook for each user or there is only one entry for one user for one notebook.

  2. Is there any potential issue that may arise if we use memory instead of db?

Regards
Girish

Hi @Girish_Kumar_Gupta.

  1. Do we create different entry in nbsignatures whenever we save the notebook for each user or there is only one entry for one user for one notebook.

I am not familiar with exactly how signatures work wrt notebooks - my previous response was formed by looking at the code. However, I can tell you that the Jupyter framework (with the exception of Hub and perhaps one or two other projects) is inherently single-user and it has no notion of differentiating actions at the user level without additional modifications.

  1. Is there any potential issue that may arise if we use memory instead of db?

It appears that the SignatureStore implementations perform culling operations once the cache_size (default 65536) has been exceeded, so your memory consumption relative to nb signatures should not exceed 12MB (based on this comment).

The cache size for MemorySignatureStore is not configurable, while for SQLiteSignatureStore it is.

Hi, have you figured out how to change the database in jupyter notebook? I know there is a way to change the dababase in hub ( hub.db.type in config.yaml) but not sure how to change it in the notebook. Any feedback will be much appreciated!

Actually I found a workaround for this issue. The issue is caused by the sqlite db is not compatible with NFS drive. One way is to replace the database from sqlite to postgre for the singleuser notebook but I haven’t figured it out how to do that (btw, you can point the hub database to postgres, which is suggested by the official doc, by adding to hub.db.type and hub.db.url.). The other way, which is the workaround I am using, is to relocate the nbsignature.db file to your k8s cluster local disk. This can be done by modifying the configuration files inside of the jhub image. The below are the steps for this.

Update below command in both /etc/jupyter/jupyter_notebook_config.py and /home/jovyan/.jupyter/jupyter_notebook_config.py in the docker image

c.NotebookNotary.data_dir = “/tmp/signature_dir”

Note: By default, in the deployment.yaml in the helm package, only the files under /home and /share directories are stored via PVC, which is NFS in my case. Whatever files beyond this scope will be stored in the local disk during the lifetime of the pod.

For this signature db file, given the size is relatively small and the nature that it is only for the duration of a single session, I think it should be fine to just store it in the local disk, instead of the postgres database.

Hopefully it will be helpful for anyone has the same issue as me.

Reference:

https://jupyter-notebook.readthedocs.io/en/stable/security.html#notebook-security

1 Like