Opinion on Pipeline Solutions for Long-Running Jupyter Notebooks and Python Scripts with Z2JH

Dear Community,

In our research lab, we have been using Z2JH (on k8s) and have found it to be a valuable tool for our work. I want to express my gratitude for the effort put into creating such a useful tool.

Recently, I have been exploring ways to execute Jupyter Notebooks or Python scripts uninterrupted for days on the same infrastructure as Z2JH. I have found two viable solutions: SLURM on top of Kubernetes and Kubeflow with the Kale plugin. I have included links to relevant resources for each solution.

  • SLURM on top Kubernetes [link]
  • Kubeflow with Kale plugin [demo video]

My goal is to find a solution that requires minimal effort to shift from Z2JH to the new pipeline system. However, I am unsure if these solutions are compatible with Z2JH. I would greatly appreciate any thoughts, opinions, or experiences that the community may have regarding these solutions.

Thank you for your time and consideration, and I apologize for the open-ended nature of this question.

A lot work has been done on integrating Dask with JupyterHub, though I haven’t used it:
https://docs.dask.org/en/stable/deploying-kubernetes.html
It’s probably worth looking at though!

I re-read my question and I apologize for not being clear enough. What we have now is Z2JH, which will probably stay that way forever. Easy to use for students and researchers. Our primary issue is long-running tasks, which get randomly terminated by the culling service once the user closes the browser. I’m searching for a solution where someone could submit/offload/migrate computation (e.g., grid search) to another service with minimal effort and pick the results after the calculation is complete.

I came across another solution, Elyra. Ability to run a notebook or Python script as a batch job.