Performance/Load/Rampup Testing for Notebook instances

sarath1212 · January 26, 2022, 4:20am

hello leads,

I am trying to do performance testing by running multiple notebook instances with pre-configured notebook files (.ipynb). So that multiple user activity can be automated simultaneously to check the infrastructure robustness.

The sample notebook file will have instructions for data loading, package execution and executing ML models.

1.Is there any best practices to do this or can we simulate using load testing tools as scripts?
2. Is it possible to execute notebook files as Rest Api (except nbconvert), so this can be invoked via testing tools? (like run cells of the notebook files)

This is mainly to verify the underlying infrastructure and custom python package execution used inside the notebooks are capable of handling loads.

Additional notes, jupyterhub and notebook used are custom deployments on AWS EKS cluster. (docker-stacks project)

could any help to provide some direction on this?

Thanks
Sarath

bollwyvl · January 26, 2022, 2:28pm

I can highly recommend locust. It uses relatively simple python models to generate statistically-significant loads, and has all the shaping you might want.

It can do a fine job of simulating 100-1000s of clients on a single node, or scale up to do some real damage. You could even spice up a small number of your agents with selenium but each browser is like hundreds of headless agents, but it’s good to light up as many routes as you can to reveal issues.

Coupled locust with some telemetry gathering and a jaeger instance, and you can get some very lovely traces.

More to the testability of the system:

Hub:

Using such approaches against a juptyerhub makes lots of sense with a load testing tool, as, with proper tracing, it would reveal actual architectural bottlenecks in, e.g. the interplay between container spawning and storage provisioning.

Kernel/Individual Package:

If you’re trying to test the kernels and packages themselves… before looking at the whole forest of a jupyterhub, it’s probably worth looking at individual packages. The gold standard for this is probably asv, used by numpy and friends.

The lovely part about asv is that focuses you not on instantaneous measures, but across time (or rather, across commits). As a code owner, this helps tremendously for finding That Commit that breaks the build.

Single-user server:

Like it says on the tin, the single-node notebook and jupyterlab_server that run on tornado inside of a spawned container are really designed to support a single, demanding superuser… functioning in another way (especially with shared tokens, and multiple clients, etc) is semi-accidental, and there is no first-party testing that I know of in this regime. You’ll likely not like what you find, and may not like the answers you get from maintainers about orders of magnitude of improvement… these would benefit from asv, but its rather an undertaking to get such a beast set up. Doing it right might be a good thing to abstract and PR to a high-level tool like maintainer-tools, with a plan for the long-term care of the data on e.g. a dedicated repo.

Further, performance issues at that level are almost always a function of:

what extensions are installed
- with cooperative multitasking, all it takes is one bad await around something that actually blocks the GIL, and the whole app is toast
the size of stuff flowing
- lots of 100mb notebooks start breaking stuff, all over, from in-browser renderers to storage

sarath1212 · January 26, 2022, 3:44pm

Thanks @bollwyvl
The response was very detailed and helpful.

manics · January 26, 2022, 6:57pm

There are also a couple of tools people have written:

and see this related thread Jupyterhub Can a large number of users start the service at the same time

sarath1212 · January 27, 2022, 3:42pm

Thanks @manics

Just thinking out loud.

All the resources are related to the hub capability to scale notebook instances. Even I tried to do a simple ramp up testing using hub api’s

But real deal, is it not required to run code in notebook instances and push to the limits so the load testing will be complete ? I suppose this testing approach provide a direct measure of users activity in real time. (create single notebook servers → start kernel → run code → check make or break)

I am not able to find any ways to remotely simulate the notebook user activity for multiple users.
help me to correct if am wrong.

Thanks

manics · January 27, 2022, 9:24pm

You’re right, so it depends on your definition of “infrastructure robustness”. If it includes 100s of simultaneous users then even if they do nothing other than start their server the load on JupyterHub and EKS will be significant and require tuning- JupyterHub will have to spin up 100s of user containers at the same time.

On the other hand if you’re mostly concerned with the execution of a complex notebook you could skip trying to execute things through the Jupyter API and just use kubectl exec to run the notebook on the command line after starting a singleuser server.

If the performance of JupyterLab/other UI is also a concern then as already mentioned Selenium or other tools will be useful.

If you’re trying to mimic a real user’s workflow (execute cell, edit, re-execute same cell) but aren’t concerned with the UI you could use the jupyter-server REST API to execute code.

sarath1212 · February 1, 2022, 6:56pm

Thanks again @manics for the clear explanation…

Topic		Replies	Views
Looking for Jupyter Notebook deployment list Binder community , help-wanted	1	580	March 22, 2022
Emulating many simultaneous logins? Zero to JupyterHub on Kubernetes	5	63	February 12, 2025
Load testing and telemetry recommendation Enterprise Gateway help-wanted	3	700	January 6, 2023
JupyterHub for Personal Use or Small Team JupyterHub	4	998	March 15, 2019
Test methodology before deployment Zero to JupyterHub on Kubernetes jupyterhub , help-wanted	3	393	March 31, 2021

Performance/Load/Rampup Testing for Notebook instances

Related topics