Maximum size of a data set that can be loaded into the memory of a user's running Jupyterlab server instance

mcberma · August 16, 2021, 6:09pm

A number of the data sets we want to perform machine learning on using Jupyter Lab (version 2.2.9 ) range between 100 GB and 1 TB. We are provisioning very large Kubernetes nodes (96 CPUs, 16 GPUs and 1.5 TB RAM).

Can you tell me what the size limit is for data sets (files) that are loaded into the memory of a user’s Jupyter Lab Server instance – assuming each user gets a dedicated node with the resources mentioned above.

minrk · August 17, 2021, 11:28am

JupyterLab and Jupyter in general shouldn’t be part of this question, except in that the host and server process might occupy on the order of 30-100MB to get running. On the scale of your data set, Jupyter shouldn’t be contributing to the memory footprint noticeably.

Instead, try looking at the tools you are using (dask, pandas, xarray, vaex, etc.) to load the data and see how much memory they use to load your data sets. The relationship of file to memory is highly dependent on the nature of the data and the tool you use to load it. What computations you do will also greatly affect how much memory you need. Many of these tools do clever out-of-memory operations to avoid filling up RAM if it doesn’t fit. You might have better luck asking in the community for your given data-loading/processing tools about how to best predict RAM usage based on data files.

mcberma · August 23, 2021, 7:50pm

Thank you for your reply.

Topic		Replies	Views
RAM Limit is 100Gb in the yaml file, but in the jupyterhub terminal showing only 64Gb Zero to JupyterHub on Kubernetes help-wanted	2	345	April 19, 2023
What is the maximum memory that Jupyterhub can use? General	1	557	August 15, 2020
Memory and cpu horizontal scaling kubernetes JupyterHub jupyterhub	1	619	June 18, 2020
Increase RAM for Jupyter notebook JupyterLab	0	9392	July 30, 2023
Jupyterlab for education on raspberries need to lighten the build process Education jupyterlab , jupyterhub	6	1389	November 25, 2020

Maximum size of a data set that can be loaded into the memory of a user's running Jupyterlab server instance

Related topics