Automated test of custom images and notebooks

I am interested in hearing how people deal with unit tests in general in Jupyter.

We maintain for our users an image of a Jupyter Lab server running in Kubernetes. We use Z2JH deployed with the helm chart.
The integrated libraries in the image (mostly Python and a bit of Perl) come from our own team or from third parties.
So far we do our validation tests of the image completely manually in the browser by using some representative notebooks to get a feel that everything is ok. Once validated, we make this new image what the users actually use.
We would like to automate that validation and integrate it into our CI/CD pipeline.

I have many questions because of the way we deploy things:

  • can specific notebooks be launched from the CLI in such an environment? I am thinking for instance about running as soon as the server starts a set of specific notebooks that would output on stdout the results of tests and terminate the server. Maybe there is something more robust than that already?

  • what’s a way to start from the cli a specific profile. I envision our Cloud Build CI would talk to the kubernetes cluster and be able to start a specific user profile to run the validation tests.

  • some tests would require user authentication/identification. We use Google SSO now both in the hub and in the image itself to log into Google Cloud. How do people deal with that during their validation tests? I can find a workaround here by running a different Z2JH environment without SSO, or even just run the container image without kubernetes.

Thanks for any ideas you may be able to share

I think Galata can help you here.

Galata is JS framework containing many utilities for manipulating JupyterLab programmatically. The Jupyter team uses it for testing Jupyter, and I’ve used it for testing JupyterLab plugins.

What are your current tests doing? Do you start JupyterLab and then ensure the notebooks run correctly? You could write some JS code that opens JupyterLab, then a specific notebook, run it and check the results.

For full end-to-end deployment testing, I can recommend the general-purpose tool, robotframework, and putting the entire system (or as much as is rational) under test. It has custom libraries for all kinds of things, but at the end of the day, is “Just Python,” so is pretty easy to extend.

Specifically for browser-based, Jupyter clients, it has a purpose-built library (disclaimer: author). The distinction here vs galata:

  • works off a user-level, “black box” model of the application
    • instead of a “white box”, tightly-bound to a specific version of e.g. JupyterLab with access to the underlying JS APIs
  • looks more like (weird) plain language than typescript
  • requires a lot more jumping jacks to get videos than a custom browser

In my specific case, which is alas, not open source, the most horror-show test suite:

  • start up a couple VMs with virtualbox
  • deploy a previous, known version of jupyterhub (specifically, TLJH, but whatever)
    • this could be replaced with e.g. minikube and helm, but the principal is the same
  • spawn a user’s environment
  • start some representative interactive computing, meanwhile…
    • upgrade the jupyterhub process to a new known version
      • handle a rollback case
  • continue to assess the user environment
    • see a message about an updated environment available
  • start a new user environment
  • verify the interactive computing stuff still works (e.g. against the same in-flight data files)

By using VM snapshots, etc. this was actually pretty reasonable to run as part of normal CI.

As for assessing some of this stuff (especially over time), there are some nice tools in the opentelemetry stack… having some frame-level awareness of “user clicks button” to “new environment created” to “results of compute appear” is quite nice, with all of the database logging, etc. put in its appropriate place.

To look at scale, one can also drop locust on it, which is again, Python, but can test anything with an HTTP endpoint. Some previous discussions:


thanks for the nice suggestions and great details.

I was imagining something without UI tests now, and just verifying the notebooks output the right text. It verifies that we indeed have the right libraries in place in our server image, and allows us to also test against the actual data that is made available to the server and under the particular user logged in.

UI tests are always way more complex to design, run and maintain.