How could data_files be improved?

Thanks for stopping by!

Anticipating this day was coming, and out of a desire to allow other snake people that use things other than setuptools to integrate with the Jupyter stack (especially with editable installs), we’ve been fighting some of this stuff out:

Our biggest challenge is that every jupyter user is forced to play the role of full-stack (web, but thankfully less now) engineer, with configurability and extensibility at almost every level of the stack. As a python-native, but ideally language-agnostic system, the filesystem is the touchpoint, and is a contract we hold with many languages, most of which have their own highly opinionated way of putting files on disk (nodejs, julia, rust, go). As such, many power users easily have 10s of first-party jupyter packages and potentially more 3rd-party extensions… while the 1,000 packages we were using for testing are not real-world, today, it does point to what could feasibly happen.

The above pass down the entry_points route points (ha) to some challenges that come from not being able to know everything is in a single, well-known place on disk at install time:

  • upstreams like tornado and jinja2 do not like looking at potentially hundreds of places on disk to find static assets and templates
  • even cutting a ton of corners, even loading hundreds of entry_points is pretty slow on fast developer machines with SSDs, and we have a lot of users that are on NFS shares, etc.

PEP 420 namespaces might work, but nobody has taken the time to explore that route and see what other gremlins exist… and as kind of a “also ran” feature, a lot of tools don’t handle them particularly well (pyreverse) or have explicitly said they won’t (flit).

1 Like