Kernel dies on requests.get() of 1.4GB file

I have a Jupyter Notebook on GitHub that I am running through MyBinder at https://mybinder.org/v2/gh/jcoliver/dig-coll-borderlands/master?filepath=Text-Mining-Template.ipynb. When downloading a zip file (second code block) with requests.get(), the memory quickly increases to the max (2GB) and the kernel dies. I swear this did not happen a month ago, and I have isolated the problem to the requests.get() line. If I pass stream = True to requests.get(), the requests.get() line runs without killing the kernel, but the download on subsequent lines does max out the memory and kills the kernel. I can run the notebook locally and the download works just fine (although it takes a few minutes). It seems like this should work, but is a 1.4GB file too big for a download with the MyBinder infrastructure?

Yes, if you reach the maximum memory limit (or go above it) then the kernel will get killed on mybinder.org.

Maybe you can fetch a subset of the file or find a way to stream it and only process a subset at a time.

Thanks, Tim. Yes, I understand the ramifications of maxing out the RAM on a MyBinder-hosted notebook. However, two sources of consternation are:

  1. The file is only 1.4GB, considerably lower than the allotment of 2GB. Other processes in the notebook are using 200MB, at most, so it is not clear how a 1.4GB file download maxes out the 2GB RAM.
  2. I can confirm that this notebook hosted on MyBinder worked as recently as 2020-10-15. That is, the code could be executed as written without exceeding the memory allocation and kernel death.

We’ve not changed the memory limit downwards recently. However some of the clusters that back mybinder.org (sometimes) offer more RAM. So maybe you got lucky in the past. You can explicitly ask for your binder to be launched on https://notebooks.gesis.org/binder/ which (permanently) offers more RAM to users.

My guess is that the peak memory usage of requests is larger than the size of the file. As in I wouldn’t be surprised to learn that they make a copy of the file contents somewhere. I am 99% sure the kernel isn’t getting killed prematurely, so it must be something that leads to it using more than 2GB (minus overheads from notebook process etc)

TL; DR: The GESIS solution works great. Thanks for the suggestion, @betatim.

When running the notebook on GESIS, the memory used never goes beyond 1.8GB (8GB are available), and all code-blocks execute successfully. Granted there is almost certainly a difference between the memory used and the memory reported, but it still seems like there might be a difference in memory consumption between MyBinder and GESIS hosting. I agree that the kernel isn’t being killed prematurely on the MyBinder hosting - I can watch as the memory climbs by 100-200 increments until it reaches the red zone above 1.9GB and then, alas, the kernel perishes.

But the GESIS-hosted solution works, so woo-hoo!