Automated tests?

I’m developing/packaging jupyterlab and I’ve run into problems with trying to make sure that any changes that I make don’t break anything.

  1. I was wondering if jupyterlab has an automated test suite
  2. Also are there any best practices for debugging jupyterlab issues. The typical issue is when I have a cell that appears incorrectly. Any tips for debugging those issues, especially with firefox developer tools?

jest is the primary tool used for automated unit testing in JupyterLab core and many extensions. It has nice features for maintainers like coverage, and a good test suite, with solid types, helps keep you honest in what can be a forest of mocks and spies, and the ability to patch stuff.

There are many tools for doing full browser testing, and JupyterLab isn’t that different from most other applications, aside from a well-above-average initial load time, and a predilection for wanting to rebuild itself if you’ve changed any code. Selenium, Cypress, puppeteer, pyppeteer are all good google starting points, it really just matters how you want to write your code. Whichever way you go, you’ll end up having to learn a lot about the JupyterLab DOM just to get your DOM to show up on the page.

Some of the biggest gotchas for any of these are:

  • working with CodeMirror (it does a lot of magic)
  • drag-and-drop
  • anything that’s in a canvas element
  • controlling the “leakage” of your interactive environment into your test environment

Some further there:

  • implement Command Palette commands for as much of your functionality as you can, and learn what commands you can use to avoid having to do too much with Menus, keyboard shortcuts, the status bar, or anything else that requires complicated user-browser interactions
    • once you have a good pattern for Open Command Palette, type command, press enter, accept dialog your test setup gets a lot shorter
  • try to keep your tests as independent and idempotent as possible
    • it’s about 30s to tear down the browser and server, so not really feasible to do every test case, but if you try to get back to a clean state every test, it opens the door for naively running them in parallel, rerunning only failed tests, etc.

I’ve personally found robotframework (and it’s first-party SeleniumLibrary) to be pretty good for integration testing of features against real browsers. We even built up robotframework-jupyterlibrary, but haven’t (and might not) update it for Lab 2.0, with Lab 3.0 on the horizon.

Here’s a gnarly example that tests many of above things. We also have a fair amount of stuff to keep it happy in continuous integration, in atest.py in that same repo: we run all the tests on every platform, and retry running them a few times if they fail. It’s not ideal, but for the extension in question, like many, it can be very hard to even gauge how many independent systems/languages are at play for what appears to be a simple feature. It’s also realllly slow. Especially on underpowered free CI Windows machines.

But even the robot only does what you tell it to, which is usually CSS selectors or (preferably) xpath. “Looks Right” is a pretty hard to measure, and therefore test… one can take the hard road of generating “gold master” screenshots, and then comparing these per test run, but this incurs a lot of overhead and complexity. Better can be to focus on gathering the right number of screenshots/videos, and actually reviewing them on occasion.

3 Likes

Thanks a lot for the discussion. I’m working on a jupyter kernel, and here is the bug where this issue came up

https://sft.its.cern.ch/jira/browse/ROOT-10924

Thanks for the pointers. I was wondering if anyone has created a “hello world” template for doing automated tests for an extension, or something that does general integration testing. I can try to rewrap the pointers you have for my own purposes, but was wondering if someone else had something small and simple that I can start with.

I don’t know of a cookiecutter or similar that exists with an end-to-end test approach specifically around lab. It would be a nice addition somewhere, for sure!

The example from robotframework-jupyterlibrary shows pretty much the shortest thing I’ve been able to put together… but of course, is somewhat out of date… mostly just on selectors, etc. which are actually pretty easy to override, unless DOM has changed substantially. That being said, I’ve found, every time I accept more “magic” everything is great… until it isn’t. For that reason, some of the heavier suites, like jupyterlab-lsp, are more home-grown, as we needed a lot of specific features.

Thanks!!!

What I’m looking for is less testing of extensions than testing of notebooks. The use case is that I’m packaging a number of notebooks in one bundle, and I need something that just makes sure that everything is correctly installed.

The robot framework looks useful.

For testing notebooks in the abstract, my first line of defense is nbconvert --execute, nbval, importnb and the like. Kernels can still be flaky (especially on windows), but if your outputs are well-behaved, you can still get a pretty good signal from just inspecting them on disk.

Once you get into “does this work properly in a browser,” and you are relying on “user presses keyboard button, expect to see X”, I’ve found, despite its limitations, that testing with real browsers is the only way to be sure… chrome has recently made this very challenging to keep working under continuous integration, so for open source, I primarily focus on Firefox ESR, driven by geckodriver and selenium.

I can’t recommend robot highly enough for confidently shipping full systems: while there are many, many solutions (noted above), I find that having a tool that is…

  • general purpose
    • i’ve used SSHLibrary and OperatingSystem to do ansible and virtualbox stuff to provision entire JupyterHubs, and then start using them with selenium)
  • well instrumented
    • human readable
    • machine readable
    • in-context screenshots
    • accurate-enough timings (not really for benchmarking)
  • extensible with python (and to a lesser extent, javascript)
    • if you really want to go down the screenshot comparison route, OpenCV and skimage are very useful

… is worth whatever shortcomings related to browser flake, selenium limitations, etc. You do have to keep it healthy, though. Acceptance tests that aren’t run frequently will fail when you most need them, and regressions can be mind-blowingly difficult to bisect.

A humorous thing we package(d) up is robotlab… it’s getting a bit long in the tooth, but from a tutorial point of view, there’s a lot of value: point, click (wait (wait longer on windows)) click, start learning how to lab/robot.

1 Like