Load testing and telemetry recommendation

I have a few questions in mind regarding the scalability of jupyter enterprise gateway and its kernels. I struggle to find answers for these, so I thought I might put it here to get some recommendations from people that are using/maintaining jupyter enterprise gateway:

  1. Do we (the community, the maintainers) have any performance benchmark or statistics for jupyter enterprise gateway?
  2. What is the biggest scale that you (a member of the community, a maintainer) have run jupyter enterprise gateway on? What kind of bottlenecks have you seen and how do you resolve them?
  3. What kinds of tools and patterns can we reliably use to load test and find bottlenecks with jupyter enterprise gateway?

I believe having these answered would be great for the whole community, because right now I can only find this Scalable Enterprise Gateway , which did not answer much

3 Likes

Hi @luong-komorebi - Thank you for posting these questions. I had been hoping someone (besides me) would respond by now, but, at this point, feel you deserve some kind of response, although probably not what you were hoping for. Short answer: we (EG) don’t have any tools to measure scalability but could certainly use them!

  1. Do we (the community, the maintainers) have any performance benchmark or statistics for jupyter enterprise gateway?

No. As you reference in your link, that post morphed into an issue that included a possible scale test, but it never made its way into a contribution.

  1. What is the biggest scale that you (a member of the community, a maintainer) have run jupyter enterprise gateway on? What kind of bottlenecks have you seen and how do you resolve them?

I’ve been made aware of at least two deployments hosting thousand-plus active instances. One, running 1500 instances, had encountered thread and ZMQ socket limitations that were addressed via environment variable configurations. I don’t know if these changes improved the situation or not, but that was prior to introducing other server instances.

  1. What kinds of tools and patterns can we reliably use to load test and find bottlenecks with jupyter enterprise gateway?

In non-open source situations, this kind of thing is typically handled by the QA/Performance team. I personally don’t have experience with these kinds of tools so am unable to answer this question at this time.

I completely agree that we could use improvements in this area. As noted in your linked post, we should also try to tackle burst-request situations as well. As is typically the case for these kinds of topics, they become a time-resource issue whose priority tends to get minimized in the grand scheme of things and they fall off the back of the wagon. One of the hurdles to overcome is that these kinds of “tuning” exercises vary by deployment, so it may be the case that organizations have applied scalability tests and improvements to suit their specific environments and we never hear back.

It would be great if we could spend some time and focus on this. I think opening an issue or a discussion item while tagging some of the previous authors could prove useful - assuming they’d be willing to further share their experiences.

Thanks again for raising this discussion!

Kevin.

1 Like

I have very little to offer in this specific area, as I don’t use kernel gateway, but can comment on a few fine-grained, vendor-agnostic load testing/tracing tool options:

  • locust is a nice tool for building representative, deterministic (if you want) user-oriented workloads against HTTP and WebSocket connections
  • while aggregates and statistics are nice, digging down into what makes a specific request slow under load benefits from a deeper look
    • the opentelemetry ecosystem has persisted past a few years after some of its predecessors
      • using an existing ecosystem vs something bespoke has a lot of advantages
        • it has some existing integrations with some of the tools that underlie the jupyter stack, such as tornado, requests, sqlalchemy
          • and probably some of the vendor-specific things needed under the hood
        • the output of feeds nicely into e.g. jaeger for drilling down

So a potential course of action might be:

  • create some baseline jupyter-locust-agents agents
    • package these per normal means, and wouldn’t be kernel gateway specific
  • create a reference instrumented deployment
    • where reference means can be stood up on some free CI
    • this gets real messy, as not everybody is using e.g. helm or terraform or whatever
  • start publishing some rough numbers generated in CI

This actually sounds like a really good hands-on Summer of Code (or related program) project, or as a master’s program capstone project.

2 Likes

Thank you so much @bollwyvl @kevin-bates. These answers really help. For now I am going to mark bollwyvl reply as the solution to temporarily close this topic and follow the path suggested by both of you
Hopefully in the end, I can create a benchmarking process that can be shared back to the community

2 Likes