Thanks to agoose77 and krassowski. I’ve learned a lot from this exchange.
The variation in the use of dashes and underscores as file path separators can be daunting at first, but as krassowski suggests, the underlying pattern may be that package names use dashes whilst module names use underscores.
There are deviations from this pattern. Some might be legacy behavior that could fade with time. Others might simply reflect random choices that survive in the absence of pressure to fix them.
Puzzling Variations
To keep the task manageable, I will look only at the separator between the first two parts of a name and use an ellipsis to suppress the additional names and separators used in names for source archives and wheels. (There seems to be a well established pattern of using dashes as separators in the remaining elements in these file names.)
To illustrate the variation and the puzzles, it helps to use some verbose terminology to distinguish five different types of names. Then we can use examples to illustrate the variation between them in reliance on dashes versus underscores.
The example I asked about
- PyPi Package Name:-----------------------------------
jupyterlab-mathjax3
- Site-Packages Directory Name:----------------------
jupyterlab-mathjax3
- Pip Freeze Name:----------------------------------------
jupyterlab-mathjax3
- Source Archive Name:----------------------------------
jupyterlab-mathjax3...
- Wheel Name:----------------------------------------------
jupyterlab_mathjax3...
In brief: “1- 2- 3- 4- 5_”
My initial question about this package was why names 1 and 2 both use a dash as the separator in contrast to the usual pattern according to which directory names in site-packages rely on an underscore. (Note also that names 4 and 5 also use different separators.)
The example suggested by krassowski
- PyPi Package Name:-----------------------------------
jupyterlab-markup
- Site Packages Directory Name:----------------------
jupyterlab_markup
- Pip Freeze:------------------------------------------------
jupyterlab-markup
- Source Archive Name:---------------------------------
jupyterlab_markup...
- Wheel Name:---------------------------------------------
jupyterlab_markup...
In brief: “1- 2_ 3- 4_ 5_”
Krassowski provides a rationale for a difference between 1 and 2 without addressing the question of why jupyter-mathjax
does not follow this pattern.
A new third example
- PyPi Package Name:--------------------------------
jupyterlab-fonts
- Site Packages Directory Name:-------------------
jupyterlab_fonts
- Pip Freeze Name:------------------------------------
jupyterlab-fonts
- Source Archive Name:------------------------------
jupyterlab-fonts...
- Wheel Name:------------------------------------------
jupyterlab_fonts...
In brief: “1- 2_ 3- 4- 5_”
Here, as for jupyter-mathjax
, the source archive name and the wheel name use different separators. But unlike jupyter-mathjax
, there is a difference between names 1 and 2.
A fourth new example
- PyPi Package Name:------------------------------
pymarkdown_minisite
- Site Packages Directory Name: ----------------
pymarkdown_minisite
- Pip Freeze Name:----------------------------------
pymarkdown-minisite
- Source Archive Name: ----------------------------
pymarkdown_minisite...
- Wheel Name: no wheel
In brief: “1_ 2_ 3- 4_”
I picked this 4th case because it was listed today as a trending site on the PyPi home page.
Pip
In all four of these examples, Pip Freeze
reports a name that uses a dash, even in the fourth case, where the PyPi name uses an underscore. This may reflect a decision to use the dash as the separator in a canonicalized version of the name for a package on PyPi and to have Pip Freeze
report the canonicalized name.
Moreover, for any package name of the form either n1-n2 or n1_n2, both of the possible pip requests will work:
% python3 -m pip install n1_n2
% python3 -m pip install n1-n2
This behavior would also be expected if pip takes the canonicalized version of the name provided in a request and looks for a match among canonicalized names on PyPi.
As a further example,
% python3 -m pip install jupyterlab_mathjax3
generates the response:
Requirement already satisfied: jupyterlab_mathjax3 in /Library/Frameworks/Python.framework/3.10/lib/python3.10/site-packages
It does so even though the directory in the site-packages directory uses a dash as the separator, not an underscore. A slightly different wording for this response might provide an opportunity to hint at pip’s use of canonicalized package names.
If the canonicalized name of any new submission is checked against the canonicalized name for each existing package, a bad actor would not be able to submit a package that with a name that differs only by a change from the dash as separator to an underscore (or vice versa).
Summary
The advice from Pip 8 is not rich enough to capture all the patterns that have emerged in practice regarding dashes versus underscores as separators in file paths. Nevertheless, the distinction that it draws between packages and modules is consistent with a strategy of assigning different separators to them: a dash for the package name and an underscore for a module/directory name.
Actual practice on PyPi also includes cases where the package name relies on an underscore separator. This may be legacy behavior that will goes away as practice converges toward the “dash for package, underscore for directory” approach. Data linked by krassowski suggests that the ratio of names with dashes to underscores is currently about 4 to 1.
It is possible that the decision in the case of mathjax3 to use a dash for a directory name in the site-packages directory was based on an argument that dashes might (as agoose77 conjectured) be a useful signal useful in cases where there is no module to import. But the decision in the comparable case of jupyter-fonts
suggests otherwise. It might simply be that in the case of jupyterlab-mathjax3
, where there is no module to import, there is little pressure to correct a dash that was used inadvertently.