Performance/Load/Rampup Testing for Notebook instances

I can highly recommend locust. It uses relatively simple python models to generate statistically-significant loads, and has all the shaping you might want.

It can do a fine job of simulating 100-1000s of clients on a single node, or scale up to do some real damage. You could even spice up a small number of your agents with selenium but each browser is like hundreds of headless agents, but it’s good to light up as many routes as you can to reveal issues.

Coupled locust with some telemetry gathering and a jaeger instance, and you can get some very lovely traces.

More to the testability of the system:

Hub:

Using such approaches against a juptyerhub makes lots of sense with a load testing tool, as, with proper tracing, it would reveal actual architectural bottlenecks in, e.g. the interplay between container spawning and storage provisioning.

Kernel/Individual Package:

If you’re trying to test the kernels and packages themselves… before looking at the whole forest of a jupyterhub, it’s probably worth looking at individual packages. The gold standard for this is probably asv, used by numpy and friends.

The lovely part about asv is that focuses you not on instantaneous measures, but across time (or rather, across commits). As a code owner, this helps tremendously for finding That Commit that breaks the build.

Single-user server:

Like it says on the tin, the single-node notebook and jupyterlab_server that run on tornado inside of a spawned container are really designed to support a single, demanding superuser… functioning in another way (especially with shared tokens, and multiple clients, etc) is semi-accidental, and there is no first-party testing that I know of in this regime. You’ll likely not like what you find, and may not like the answers you get from maintainers about orders of magnitude of improvement… these would benefit from asv, but its rather an undertaking to get such a beast set up. Doing it right might be a good thing to abstract and PR to a high-level tool like maintainer-tools, with a plan for the long-term care of the data on e.g. a dedicated repo.

Further, performance issues at that level are almost always a function of:

  • what extensions are installed
    • with cooperative multitasking, all it takes is one bad await around something that actually blocks the GIL, and the whole app is toast
  • the size of stuff flowing
    • lots of 100mb notebooks start breaking stuff, all over, from in-browser renderers to storage
2 Likes