Papermill required parameters in Notebook

Hello,

I am relatively new to papermill but I am looking into utilizing it for Notebook environments. I was wondering if there is any documentation or examples of when a notebook that was not given parameters or assigned value for a certain parameter would give an error or warning message (to assign parameter to value) before executing the notebook? This is essential as a user would like to be informed before running a notebook of a parameter that must be inputted in order to not result in a inaccurate executed notebook.

Any help would be appreciated!

Sounds like what you are looking for is some sort of user-friendly ‘control plane’? I don’t have anything concrete to offer as there seems to be a number of ways to go about this depending on what else is in your toolchain or how you are more comfortable implementing such a thing.

Domino Data Labs has implemented what it seems you are describing as ‘Launchers’ that you as the notebook developer make to go along with your notebook (or script), see here.

The developer of the notebook could do something similar in a pure Jupyter ecosystem using Voila dashboards with ipywidgets for parameter input, in conjunction with the developed notebook. (Make sure you check out the Voila Gallery.)

If your user is otherwise comfortable with typing the papermill execution command and just looking for some little hints of what they may have forgot in the command, you could make a Voila dashboard that helps them build that execution command. I am imagining the Voila dashboard that queries the input notebook for parameter tags and then checks the text of the papermill execution command in the text form and sees if it addresses that parameter already.
I’m thinking of something along the lines of the nbgitpuller URL generating form. (I know from being an observer following along adding an ability to that form ,see here and here], that form is javascript-backed and not based on ipywidgets. But like I said above, your flavor of implementation can vary.)

If you just want the user to not see the notebook or script backing the data generation at all, you may want them just to face an appmode version of your notebook or a Voila dashboard and you as the developer handle what is needed to get them the output they seek. (The Voila gallery may give you more of an idea what is possible with Voila.) You can see some variations on that type of interface in this repo. Specifically, go there, and then click launch binder, and then select ‘3D scatter plot using data in a file and Voila interface’ from the index list once the session opens. To get an idea of how you may want your user to access and run the code without ‘seeing’ the code, you can launch directly into a running session of the streamlined, companion Voila dashboard here.

Possibly related material:


Hi,

I appreciate all the feedback. I was wondering if papermill specifically has the ability to tell the user whether a parameter is required to run a notebook. I was looking into bringing in papermill extensibility within the notebook environment setup in Azure Data Studio.

Since Azure Data Studio has quite the extensibility for a notebook environment, bringing in papermill would be great to add on to the product. The one question I would have is how to indicate to a user to change default parameters in order for and executed notebook to not just output an error-filled notebook. Is it possible within papermill to indicate the user for parameters or state in regard to when there are defaulted values for parameters (such as when city=" " empty string)?

I am not aware of papermill doing that. Like I said, you’d need to implement the ‘control level’ yourself if your users aren’t competent to generate the papermill command themselves. I suggested using a Voila dashboard as an interface.

There was discussion of adding more control options for analyzing the parameter cell https://github.com/nteract/papermill/issues/225, but not all of the edge cases have been thought out to be implementable. But as @fomightez stated, in general most people use the system calling papermill to check for required fields (e.g. a DAG scheduler constraint, or a Viola dashboard, or a simple bash wrapper), or have the notebook assert in the cell after the parameters to validate expectations for inputs. Putting asserts in the notebook is usually good enough for most use cases and gives clear intent and messaging on failure.

1 Like