Tool for Notebook Workflows?

HashRocketSyntax · March 26, 2020, 5:17pm

Hey there. I am looking for a tool that let’s me string together the outputs of a series of notebooks in a UI. Does this exist?

E.g. notebook for importing and splitting up dataset. Then notebook for training model. Then notebook for working with results. Then notebook for visualization.

Problem to Solve: my notebook is now… thousands of lines of code. I want isolation/encapsulation from a best practice coding perspective. Maybe I could use different Python versions or different conda envs or even switch over to either R or Scala in each notebook. If you were going all out - then you would be able to specify deeper things like Java version being used.

Something like Rabix, but for notebooks instead of containers.

HashRocketSyntax · March 26, 2020, 5:28pm

Observable posted this: https://observablehq.com/@observablehq/introducing-visual-dataflow

jasongrout · March 26, 2020, 6:14pm

Netflix has done a huge amount of work in pipelines of notebooks: https://netflixtechblog.com/notebook-innovation-591ee3221233 - also check out Papermill and Paperboy

IBM just released some open source extensions dealing with pipelines of notebooks as well: https://github.com/elyra-ai/elyra

fomightez · March 26, 2020, 10:26pm

Netflix’s Scrapbook now has the related functionality for handling the outputs, see here for some example use/discussion.

There is also https://github.com/krassowski/nbpipeline.

psychemedia · March 27, 2020, 9:48am

I’ve not managed to play with this yet, but as @jasongrout mentioned, you might be able to co-opt https://github.com/elyra-ai/elyra to do some of what you want? It allows you to use a GUI to string notebooks together in a pipelline and then execute them using Kubeflow Pipeline (tho’ the docs say “Currently, the only supported pipeline runtime is…” which suggests it may have been architected to allow other pipeline runtimes to slot in?)

[Cheekily asks:] If you try it and it either works for your use case, or doesn’t, could you provide a quick review as a datapoint around whether it works for your sort of use case? #lazyweb

MSeal · April 1, 2020, 5:51pm

Airflow has docs on how to use papermill and other systems have direct or indirect integrations to that tool now-a-days. Dagster is one example which extends papermill.

Scrapbook is used internally at a few companies now to save notebook outcomes. Some extend it to save into their metadata stores in parallel to the notebook document. There’s a few pending features for scrapbook that I haven’t been able to get to – would love more devs helping on that project.

HashRocketSyntax · April 1, 2020, 8:16pm

Hmm. So I guess what I would really want is a UI for Airflow inside JupyterLab. Not that there would really be a problem navigating to Airflow’s localhost… In a JupyterHub scenario, users may not be able to freely navigate to other portals.

MSeal · April 1, 2020, 8:58pm

I would recommend navigating to the scheduler once you push a scheduled version. The complexities of doing a good UI for workflow management are large. If the user can’t interface with the scheduler service you’ll find the usability of running anything there dramatically reduced. You have to reimplement error handling, monitoring, alertings, etc. As someone who does this work on a daily basis, it’s best to not try to duplicate those types of efforts when possible as you can sink years of time into making it work well.

edublancas · December 30, 2020, 10:40pm

Ploomber (disclaimer: I’m the author) is a good option for this. It allows you to create notebook-based pipelines (scripts and functions can be used as well). It requires minimal “pipeline code”, just list your notebooks in a YAML file, for full flexibility there is a Python API available. Exporting pipelines to Airflow and Kubernetes (via Argo workflows) is supported.

Github repository.

Feel free to reach out directly if there are any questions! Twitter: edublancas

Topic		Replies	Views
A list of notebook / pipeline launchers discuss	1	559	April 8, 2021
Brainstorm: If you could run a Jupyter Notebook as a pipeline, what would you want it to look like? Special Topics pipelines , notebook	3	1153	December 15, 2021
A new and simple way to manage notebooks on Kubernetes Zero to JupyterHub on Kubernetes announcement , community , jupyterlab , jupyterhub	0	723	January 23, 2023
Opinion on Pipeline Solutions for Long-Running Jupyter Notebooks and Python Scripts with Z2JH Zero to JupyterHub on Kubernetes how-to , help-wanted	3	684	April 26, 2023
Jupyter vs Marimo Notebook	1	1011	September 18, 2024

Tool for Notebook Workflows?

Related topics