Performance/Load/Rampup Testing for Notebook instances

hello leads,

I am trying to do performance testing by running multiple notebook instances with pre-configured notebook files (.ipynb). So that multiple user activity can be automated simultaneously to check the infrastructure robustness.

The sample notebook file will have instructions for data loading, package execution and executing ML models.

1.Is there any best practices to do this or can we simulate using load testing tools as scripts?
2. Is it possible to execute notebook files as Rest Api (except nbconvert), so this can be invoked via testing tools? (like run cells of the notebook files)

This is mainly to verify the underlying infrastructure and custom python package execution used inside the notebooks are capable of handling loads.

Additional notes, jupyterhub and notebook used are custom deployments on AWS EKS cluster. (docker-stacks project)

could any help to provide some direction on this?

Thanks
Sarath

I can highly recommend locust. It uses relatively simple python models to generate statistically-significant loads, and has all the shaping you might want.

It can do a fine job of simulating 100-1000s of clients on a single node, or scale up to do some real damage. You could even spice up a small number of your agents with selenium but each browser is like hundreds of headless agents, but it’s good to light up as many routes as you can to reveal issues.

Coupled locust with some telemetry gathering and a jaeger instance, and you can get some very lovely traces.

More to the testability of the system:

Hub:

Using such approaches against a juptyerhub makes lots of sense with a load testing tool, as, with proper tracing, it would reveal actual architectural bottlenecks in, e.g. the interplay between container spawning and storage provisioning.

Kernel/Individual Package:

If you’re trying to test the kernels and packages themselves… before looking at the whole forest of a jupyterhub, it’s probably worth looking at individual packages. The gold standard for this is probably asv, used by numpy and friends.

The lovely part about asv is that focuses you not on instantaneous measures, but across time (or rather, across commits). As a code owner, this helps tremendously for finding That Commit that breaks the build.

Single-user server:

Like it says on the tin, the single-node notebook and jupyterlab_server that run on tornado inside of a spawned container are really designed to support a single, demanding superuser… functioning in another way (especially with shared tokens, and multiple clients, etc) is semi-accidental, and there is no first-party testing that I know of in this regime. You’ll likely not like what you find, and may not like the answers you get from maintainers about orders of magnitude of improvement… these would benefit from asv, but its rather an undertaking to get such a beast set up. Doing it right might be a good thing to abstract and PR to a high-level tool like maintainer-tools, with a plan for the long-term care of the data on e.g. a dedicated repo.

Further, performance issues at that level are almost always a function of:

  • what extensions are installed
    • with cooperative multitasking, all it takes is one bad await around something that actually blocks the GIL, and the whole app is toast
  • the size of stuff flowing
    • lots of 100mb notebooks start breaking stuff, all over, from in-browser renderers to storage
2 Likes

Thanks @bollwyvl
The response was very detailed and helpful.

There are also a couple of tools people have written:

and see this related thread Jupyterhub Can a large number of users start the service at the same time

2 Likes

Thanks @manics

Just thinking out loud.

All the resources are related to the hub capability to scale notebook instances. Even I tried to do a simple ramp up testing using hub api’s

But real deal, is it not required to run code in notebook instances and push to the limits so the load testing will be complete ? I suppose this testing approach provide a direct measure of users activity in real time. (create single notebook servers → start kernel → run code → check make or break)

I am not able to find any ways to remotely simulate the notebook user activity for multiple users.
help me to correct if am wrong.

Thanks

You’re right, so it depends on your definition of “infrastructure robustness”. If it includes 100s of simultaneous users then even if they do nothing other than start their server the load on JupyterHub and EKS will be significant and require tuning- JupyterHub will have to spin up 100s of user containers at the same time.

On the other hand if you’re mostly concerned with the execution of a complex notebook you could skip trying to execute things through the Jupyter API and just use kubectl exec to run the notebook on the command line after starting a singleuser server.

If the performance of JupyterLab/other UI is also a concern then as already mentioned Selenium or other tools will be useful.

If you’re trying to mimic a real user’s workflow (execute cell, edit, re-execute same cell) but aren’t concerned with the UI you could use the jupyter-server REST API to execute code.

1 Like

Thanks again @manics for the clear explanation…