Naming conventions, _ vs -; why does "jupyterlab-mathjax3" use a dash?

I’ve been trying to work out some guidelines for using dash versus underscore in file names.

One rule I had provisionally adopted was to avoid using a dash in the name of a module that I might want to import because Python won’t let me import it if it contains a dash.

Is there any particular reason why “jupyterlab-mathjax3” uses a dash?

Does it matter that one can’t import it using the usual import statement?

Thx!

I hadn’t noticed that this package uses a dash in its package name, and indeed so too do the other renderer packages. In general, this is not good practice. If it’s being done deliberately to discourage imports, that would feel a little overzealous to me — Python has an underlying principle of “don’t do that” vs “can’t do that”. However, I suspect this is not a deliberate naming strategy; these packages were probably initially generated from a scaffold that included the dash. However, it might be worth raising this on the repo.

Thx!

This answers my question about whether it is a problem that one can’t import it. It sounds like one shouldn’t import it – perhaps also that one shouldn’t import any renderer package.

To be honest, I don’t really know what a renderer package is or how one is used – but the name offers a clear suggestion and this gives me a lead to pursue.

BTW, the reason this came up was that I was trying to understand the factors that contribute to the time it takes for Jupyter Lab to start and did the naive thing of trying to import each of the packages in site-packages to see how long each takes.

I don’t think I’ll raise the naming issue on the repo because I don’t understand the context well enough. As someone trying to learn, I can say that it is difficult to infer the implicit or explicit conventions that people in the Python community follow for naming files. It seems that there are no hard rules about dashes versus underscores, but there do seem to be patterns. I’d welcome any pointers to a general discussion on this topic.

The renderer packages aren’t useful to import either; they install data-files to the share/ directory that Jupyter Lab picks up during startup. The fact that they are Python packages is just a means of delivery.

In general, it’s nearly always a mistake to have Python modules contained in non-importable (via import, at least) packages.

On reflection, it might be a good general rule to use a dash in the name of any package that the author does not intend people to use via import.

To the contrary! For example, GitHub - krassowski/jupyterlab-dagitty: JupyterLab renderer of dagitty causal diagrams is meant to be imported as it provides IPython display utilities for the renderer that it uses. It also has a dash in the PyPI distribution name (jupyterlab-dagitty · PyPI), but its package uses underscore (jupyterlab-dagitty/jupyterlab_dagitty at main · krassowski/jupyterlab-dagitty · GitHub) to allow importing. I think that this is the best of both worlds as it allows users to install it via both pip install jupyterlab-foo and pip install jupyterlab_foo.

It is however true that none of the GitHub - jupyterlab/jupyter-renderers: Renderers and renderer extensions for JupyterLab have anything to offer so importing them is not useful in itself. If in doubt when creating a PyPI package please use a dash (and underscore for Python package if it is meant to be importable).

I see the logic of what you are saying, but for this to work, it will be important to make sure that consumers of such files understand that they must pay careful attention to the use of dash versus underscore in the same position in various file names. See e.g. this image from PyPi for your example:

It would be an interesting empirical exercise to have a number of people go to the page with these two files (supposedly to learn about ways to download them) and then test to see how many of them notice the difference in the separators for these two files. Until I read your post, I would always have expected that the separator for files from the same author to be the same across files. So I would almost surely have failed such a test.

If many people fail, it means that we face a challenging problem with typo-squatting. Perhaps it would be useful to scan PyPi for cases of file names that differ only by a substitution of a dash for an underscore or vice versa. Such a scan would not generate a false positive by identifying the two files on this page. If it turns up any instances, it might be a good idea for someone to take a close look at them.

I don’t see a problem. The wheel/source build artefacts are not meant for direct consumption by end-users, so they will not be looking at them.

There is no typo-squatting danged exactly because PyPI and pip normalises _ to -. Try pip freeze: do you see anything with _? What is more user-friendly, having consistency with package manager or not? Also, it is a predominant convention to use - in PyPI package names: Analysis of PyPi package names and the use of dashes underscores upper and lower case · GitHub.

As an example lets:

pip install jupyterlab_markup

And then run:

$ pip freeze | grep markup
jupyterlab-markup==2.0.0

Thanks to agoose77 and krassowski. I’ve learned a lot from this exchange.

The variation in the use of dashes and underscores as file path separators can be daunting at first, but as krassowski suggests, the underlying pattern may be that package names use dashes whilst module names use underscores.

There are deviations from this pattern. Some might be legacy behavior that could fade with time. Others might simply reflect random choices that survive in the absence of pressure to fix them.

Puzzling Variations

To keep the task manageable, I will look only at the separator between the first two parts of a name and use an ellipsis to suppress the additional names and separators used in names for source archives and wheels. (There seems to be a well established pattern of using dashes as separators in the remaining elements in these file names.)

To illustrate the variation and the puzzles, it helps to use some verbose terminology to distinguish five different types of names. Then we can use examples to illustrate the variation between them in reliance on dashes versus underscores.

The example I asked about

  1. PyPi Package Name:-----------------------------------jupyterlab-mathjax3
  2. Site-Packages Directory Name:----------------------jupyterlab-mathjax3
  3. Pip Freeze Name:----------------------------------------jupyterlab-mathjax3
  4. Source Archive Name:----------------------------------jupyterlab-mathjax3...
  5. Wheel Name:----------------------------------------------jupyterlab_mathjax3...

In brief: “1- 2- 3- 4- 5_”

My initial question about this package was why names 1 and 2 both use a dash as the separator in contrast to the usual pattern according to which directory names in site-packages rely on an underscore. (Note also that names 4 and 5 also use different separators.)

The example suggested by krassowski

  1. PyPi Package Name:-----------------------------------jupyterlab-markup
  2. Site Packages Directory Name:----------------------jupyterlab_markup
  3. Pip Freeze:------------------------------------------------jupyterlab-markup
  4. Source Archive Name:---------------------------------jupyterlab_markup...
  5. Wheel Name:---------------------------------------------jupyterlab_markup...

In brief: “1- 2_ 3- 4_ 5_”

Krassowski provides a rationale for a difference between 1 and 2 without addressing the question of why jupyter-mathjax does not follow this pattern.

A new third example

  1. PyPi Package Name:--------------------------------jupyterlab-fonts
  2. Site Packages Directory Name:-------------------jupyterlab_fonts
  3. Pip Freeze Name:------------------------------------jupyterlab-fonts
  4. Source Archive Name:------------------------------jupyterlab-fonts...
  5. Wheel Name:------------------------------------------jupyterlab_fonts...

In brief: “1- 2_ 3- 4- 5_”

Here, as for jupyter-mathjax, the source archive name and the wheel name use different separators. But unlike jupyter-mathjax, there is a difference between names 1 and 2.

A fourth new example

  1. PyPi Package Name:------------------------------pymarkdown_minisite
  2. Site Packages Directory Name: ----------------pymarkdown_minisite
  3. Pip Freeze Name:----------------------------------pymarkdown-minisite
  4. Source Archive Name: ----------------------------pymarkdown_minisite...
  5. Wheel Name: no wheel

In brief: “1_ 2_ 3- 4_”

I picked this 4th case because it was listed today as a trending site on the PyPi home page.

Pip

In all four of these examples, Pip Freeze reports a name that uses a dash, even in the fourth case, where the PyPi name uses an underscore. This may reflect a decision to use the dash as the separator in a canonicalized version of the name for a package on PyPi and to have Pip Freeze report the canonicalized name.

Moreover, for any package name of the form either n1-n2 or n1_n2, both of the possible pip requests will work:

% python3 -m pip install n1_n2

% python3 -m pip install n1-n2

This behavior would also be expected if pip takes the canonicalized version of the name provided in a request and looks for a match among canonicalized names on PyPi.

As a further example,

% python3 -m pip install jupyterlab_mathjax3

generates the response:

Requirement already satisfied: jupyterlab_mathjax3 in /Library/Frameworks/Python.framework/3.10/lib/python3.10/site-packages

It does so even though the directory in the site-packages directory uses a dash as the separator, not an underscore. A slightly different wording for this response might provide an opportunity to hint at pip’s use of canonicalized package names.

If the canonicalized name of any new submission is checked against the canonicalized name for each existing package, a bad actor would not be able to submit a package that with a name that differs only by a change from the dash as separator to an underscore (or vice versa).

Summary
The advice from Pip 8 is not rich enough to capture all the patterns that have emerged in practice regarding dashes versus underscores as separators in file paths. Nevertheless, the distinction that it draws between packages and modules is consistent with a strategy of assigning different separators to them: a dash for the package name and an underscore for a module/directory name.

Actual practice on PyPi also includes cases where the package name relies on an underscore separator. This may be legacy behavior that will goes away as practice converges toward the “dash for package, underscore for directory” approach. Data linked by krassowski suggests that the ratio of names with dashes to underscores is currently about 4 to 1.

It is possible that the decision in the case of mathjax3 to use a dash for a directory name in the site-packages directory was based on an argument that dashes might (as agoose77 conjectured) be a useful signal useful in cases where there is no module to import. But the decision in the comparable case of jupyter-fonts suggests otherwise. It might simply be that in the case of jupyterlab-mathjax3, where there is no module to import, there is little pressure to correct a dash that was used inadvertently.

1 Like