Persistent computation on JH server after client disconnects

Hi! I followed the great Z2JH guide and achieved a fully functional bare metal install via microk8s.

My question is, if a client connects to the jupyterhub instance from another machine and has a long-running notebook, can that client close the browser window (or even his machine) and come back to the calculation later in order to find the final results? Or does the computation stops once the browser is closed?

How can one keep the JupyterLab notebook running and come back to it at a later date?

The notebook should keep running in the background, but the notebook output will be lost. There’s an open issue here:

The first part of the work, JupyterLab RTC for collaborative editing, is available but it’s not production ready. See this topic for the current state of things:

Additional work is still needed to allow a client to reconnect and receive the missed outputs.

Thank you for the detailed reply.

I guess this means that if the notebook outputs the results of a long computation to file, that can still be recovered later on?

Yes, that’s a common pattern. If you’re only interested in figures, you can add calls to plt.savefig to plot cells to write your figures to disk to look at later.

If you know you’re going to run offline, you can use a tool like papermill, which will run offline and capture output.

I used a caching pattern in my thesis in 2012 to run overnight simulations and check on them in the morning. My pattern was to write expensive cells that looked like:

# remove or rename this file to force recompute
cache_file = "..."
if os.path.exists(cache_file):
    # this branch taken after the first successful run
    load_cache_file(cache_file)
else:
    # only run once
    compute()
    save_cache_file(cache_file)
display_something()

As a result, ‘restart and run all’ would run quickly by loading all the results and displaying them, without recomputing anything expensive. This assumes that you can actually serialize your results to files, though.

You could do this more conveniently by writing a %%cache cell magic, or if you’re lucky and most of your expensive computations are pure functions of hashable inputs, you can use a modified functools.cache that caches to disk instead of memory.

The challenge for %%cache is that it is hard to compute the cache key in general, and to figure out what should be recomputed and what indeed can be serialized and re-loaded from disk, so explicit manual caching always worked best for me.