jest is the primary tool used for automated unit testing in JupyterLab core and many extensions. It has nice features for maintainers like coverage, and a good test suite, with solid types, helps keep you honest in what can be a forest of mocks and spies, and the ability to patch stuff.
There are many tools for doing full browser testing, and JupyterLab isn’t that different from most other applications, aside from a well-above-average initial load time, and a predilection for wanting to rebuild itself if you’ve changed any code. Selenium, Cypress, puppeteer, pyppeteer are all good google starting points, it really just matters how you want to write your code. Whichever way you go, you’ll end up having to learn a lot about the JupyterLab DOM just to get your DOM to show up on the page.
Some of the biggest gotchas for any of these are:
- working with CodeMirror (it does a lot of magic)
- anything that’s in a
- controlling the “leakage” of your interactive environment into your test environment
Some further there:
- implement Command Palette commands for as much of your functionality as you can, and learn what commands you can use to avoid having to do too much with Menus, keyboard shortcuts, the status bar, or anything else that requires complicated user-browser interactions
- once you have a good pattern for
Open Command Palette, type command, press enter, accept dialog your test setup gets a lot shorter
- try to keep your tests as independent and idempotent as possible
- it’s about 30s to tear down the browser and server, so not really feasible to do every test case, but if you try to get back to a clean state every test, it opens the door for naively running them in parallel, rerunning only failed tests, etc.
I’ve personally found robotframework (and it’s first-party SeleniumLibrary) to be pretty good for integration testing of features against real browsers. We even built up robotframework-jupyterlibrary, but haven’t (and might not) update it for Lab 2.0, with Lab 3.0 on the horizon.
Here’s a gnarly example that tests many of above things. We also have a fair amount of stuff to keep it happy in continuous integration, in
atest.py in that same repo: we run all the tests on every platform, and retry running them a few times if they fail. It’s not ideal, but for the extension in question, like many, it can be very hard to even gauge how many independent systems/languages are at play for what appears to be a simple feature. It’s also realllly slow. Especially on underpowered free CI Windows machines.
But even the robot only does what you tell it to, which is usually CSS selectors or (preferably) xpath. “Looks Right” is a pretty hard to measure, and therefore test… one can take the hard road of generating “gold master” screenshots, and then comparing these per test run, but this incurs a lot of overhead and complexity. Better can be to focus on gathering the right number of screenshots/videos, and actually reviewing them on occasion.