Is there any hook to restore Notebook packages?

I want to restore the notebook’s pip packages on notebook start.
One implementation is:

  1. Before shutdown the notebook, save the package list to database
  2. Before start the notebook, install package from the package list.

I know that we can put scripts in IPython's profile directory. They will be executed upon notebook start. Is there any hook on notebook shutdown event?

With most spawners the installed packages should not get lost. Can you share your configuration?

1 Like

It sounds like the user is installing packages locally that aren’t in the notebook image. Then when their container is recreated those packages aren’t persisted.

And talking about “notebook image” I guess you assume a Docker container as it is used for e.g. the DockerSpawner? Sure that is possible! Then maybe the libraries should be part of the image. But that is just guessing…

1 Like

Found IPython.core.hooks.shutdown_hook. It can be utilized to save pkg_resources.working_set to some persistent storage. Then restore the package in IPython profile.
Maybe there are better solutions?

@wondertx please share your configuration. Otherwise we can only guess your setup.

We have a customized spawner that works like KubeSpawner and a separate configurable-http-proxy. Besides that, The jupyterhub_config.py file has something like this

def on_eixt():
    logger.info('On exit')
    installed_packages = pkg_resources.working_set
    installed_packages_list = sorted(["%s==%s" % (i.key, i.version) for i in installed_packages])
    logger.info(installed_packages_list)


atexit.register(on_eixt)

Well, you might also adjust the images you spawn for every user. If you have a custom spawner, it is even more difficult to comment on that.

I think the best way to do this is to give users a persistent volume and instruct them on how to install packages in the persistent directory. And possibly configure the image to help make this easier, if you find it’s too complicated/tedious from the start. For example, adding configuration to set the default pip install path, and make sure it’s on sys.path. pip install --user is usually enough with a persistent home directory, though.

If you do find that you still want to do the persistence yourself, since you seem to be using kubernetes, lifecycle hooks are likely the right time to take the relevant actions - before/after the server starts and stops, instead of kernels.

Putting atexit handlers in jupyterhub_config.py will run when the hub exits, not when user servers or kernels exit.

Persistent storage is a great idea. In fact, I am already using HDFS as the storage backend and it works nicely.
By the way, I am not using Kubernetes directly. But though a private API that can spawn instance on request. Quite bizarre environment :smile: