Can repo2docker resolve python_requires in setup.py?

setup.py (and setup.cfg etc) in the python package world has a python_requires directive. Can that be used to specify a version of python to install?

Seems that there are at least three ways to specify the version of python (environment.yml > runtime.txt) but also this setup.py way.

Any insight into how (and if) that should be handled?

setup.py (and setup.cfg etc)

and pyproject.toml, poetry.toml, and hatch.toml and… great thing about python packaging “standards,” there are so many to choose from.

The specific issue with setup.py is that one would need to actually run the module, as the python_requires could be dynamically derived from … some other thing. And might try to import the module in question. And might have other, undeclared dependencies in the ill-fated setup_requires chicken and egg.

Similarly, pyproject.toml can defer almost anything to dynamic, which makes it almost useless as an interchange format.

r2d is entirely heuristic-driven, of course, but a PR that handled all of the above cases would take it up another notch, and doing less would require more documentation than almost any other feature.

From a docker cache perspective, vs the r2d-special runtime.txt and more-portable environment.yml, supporting these would probably be strictly worse, as it wouldn’t be able to do “the big solve layer” against a static file, copied in before the repo content. Instead, by necessity, that expensive layer would invalidated by every change to the repo. This is also the observed behavior when requirements.txt or environment.yml uses -r or -e ..

So my gut reaction: this wouldn’t be worth the complexity, and potentially confusing errors.

1 Like

Thanks, that makes sense. I mean the explanation of the situation (not the joyous python packaging melange :slight_smile:

Perhaps r2d could parse the setup.py, see python_requires and warn (pointing to the environment.yml as the preferred way? But then it would need to do that with all the formats that you mention there. And, well, to do that it would need to execute them because they can be dynamic … and we are right back where we started.

For my usecase I’m going to chat with my collaborator about reverting to environment.yml and requirements.txt. I don’t think that the setup.py is crucial (but it is the way that he is familiar with).

Yep. I’d basically rule out setup.py entirely, as there are yet another 20 ways to declare it (single quotes, dict update, read from some other file, etc.). And docker build warnings generally look broken, not helpful.

The only closed-form, standards-based one I see worth even fighting for a PR is a non-dynamic pyproject.toml#project/requires-python, but that would only provide part of the picture. Does a package need more stuff to build, like a working rust compiler? nodejs for web junk? Are these declared in extras? It’s really just very hard to tell. But you might get a compatible python: even then, that’s a match spec (not a simple version), so there would be some ambiguity: the oldest functional? The newest possible?

As, at least on Binderhub, one is already running in a mamba environment, and primary Jupyter maintainers are also often the conda-forge maintainers, I’d say .binder/environment.yml (with or without a /dependencies/pip/*, but definitely without dependencies/pip/-r) and postBuild is the least surprising way to describe a “reproducible” editable environment (+/- some of the kinda weird extras binderhub tosses in).

I agree it’s probably not something that should be added to repo2docker. However if you’re interested in pursuing the idea here’s an intermediate option. The list of files processed by repo2docker is well defined. Any additional Python dependency parsers will output the equivalent of those config files, e.g. requirements.txt plus runtime.txt to specify the python version.

You could write a magic r2d-config-generator command line tool that creates or updates the r2d config files in ./binder for an input repository. Optionally run this tool as part of your CI pipeline to ensure the repository’s files are kept up to date.