Torch.load fails after downloading .pth file from JupyterLab (invalid magic number)

Hi everyone,

I ran into a problem with a PyTorch model backup and I’m not sure what went wrong. Here’s the situation:

  • I had a model weights file model.pth (≈2.4 GB) on my cluster. I downloaded it by clicking on the file in Jupyterlab, so it is stored on my local home on ubuntu.

  • I downloaded it through JupyterLab by clicking on the file as a backup.

  • Later, a git pull deleted the original .pth file from the cluster. (no other backup made)

  • The downloaded file, however, is not a simple .pth file anymore. Instead, it was serialized as a zip file containing:

    • data.pkl

    • version

    • several binary files

  • When I unpack the zip and try to load the model with torch.load('model.pth'), I get the following error:

UnpicklingError                           Traceback (most recent call last)
Cell In[29], line 18
     15     pickletools.dis(f)
     17 #Load
---> 18 data = torch.load(pth_file, weights_only=False)
     19 print(data.keys())

File ~/Masterthesis/Local/.venv/lib/python3.12/site-packages/torch/serialization.py:1554, in load(f, map_location, pickle_module, weights_only, mmap, **pickle_load_args)
   1552     except pickle.UnpicklingError as e:
   1553         raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
-> 1554 return _legacy_load(
   1555     opened_file, map_location, pickle_module, **pickle_load_args
   1556 )

File ~/Masterthesis/Local/.venv/lib/python3.12/site-packages/torch/serialization.py:1802, in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
   1799         # if not a tarfile, reset file offset and proceed
   1800         f.seek(0)
-> 1802 magic_number = pickle_module.load(f, **pickle_load_args)
   1803 if magic_number != MAGIC_NUMBER:
   1804     raise RuntimeError("Invalid magic number; corrupt file?")

UnpicklingError: A load persistent id instruction was encountered, but no persistent_load function was specified.

It seems like JupyterLab might have changed the file format during download.

Questions:

  1. Does anyone know how to correctly restore the original .pth file from this downloaded zip?

  2. Is there a known way JupyterLab handles large file downloads that could explain this behavior?

Any guidance or tips would be really appreciated!

Thanks in advance.

what version of jupyterlab are we talking about?
how big is the file you are looking at?
i don’t know the .pth file format but there are quite a few formats out there that are basically renamed zip files.

as a sidenote: my current jupyterlab handles largefile downloads as i would expect it.
i just created a random 3GB .bin file and downloaded it through the web. there is no zipping taking place.

2 Likes