Best strategy for dealing with in-repository modules and notebook working directory

The following seems like an extremely basic question, but I’ve been kind of tearing my hair out in trying to find what an ideal answer to it, or even if there’s a standard answer, and my search terms have come to very little so far (I hope I am not missing something completely obvious). The question is, essentially: what is the best way to work with notebook working directories in the context of a repository? I have a pretty standard repository structure where the essential part looks like the following picture. Within docs I have some jupyter notebooks that run import mymodule:

repository
\- mymodule
\- docs
    \- overview.ipynb
    \- manual.ipynb
    \- examples.ipynb
    \- ...

The problem is simple: for development purposes, what is the best or standard way to get that import to work? When jupyter lab opens a notebook, it sets the working directory for python path purposes to that notebook’s directory, so the import will fail in this structure without something else. Here’s what I’ve come with from searching and brainstorming, approximately from best to worst:

  1. symlink mymodule into the docs directory and add it to .gitignore. Creation of the link could be automated for new repository users. I’ve done this before and I suspect will do it again. Pros: one-time setup that can be scripted, relatively general, completely specific to a repository structure (as opposed to env or per-user config). Cons: new repository instantiation will fail to run the relevant notebooks in-repo without an extra step; swapping to using an installed mymodule for any reason requires deleting symlink or moving the notebooks; more would be needed for any subdirectories as well.

  2. run jupyter lab with an explicit PYTHONPATH that includes the dev location of mymodule. Pros: Completely general to whatever structure is in docs, scriptable. Cons: Not a one-time setup I will forget to do this 100% of the time and as far as I know that requires restarting lab to fix (committing a script to the repo would help mitigate). Probably for most people and cases the best general solution though.

  3. work within a venv of some sort and reinstall mymodule every time it changes. Pros: Completely general to whatever structure is in docs, scriptable. Cons: very heavyweight, easy to accidentally overwrite installed package in a non-dev env, requires jupyter lab to already be running in the correct env (at least using conda envs, which I am), annoying barrier to dev-test loop. Possible variant: set env-specific config somehow? (I do use envs, just don’t want the install step.)

  4. create a kernel with the relevant path built in. I have actually done something like this before for the case where I’m distributing a non-trivial kernel (so the kernel name corresponds to something that gets installed with the package), but I don’t think this makes sense for the general case because the notebooks are committed, and the committed versions need to run with the regular ipython kernel. Could mess with metadata pre-commit.

  5. something with ~/.jupyter: haven’t really explored this because I want something that makes the repository work independent of my local setup, and ideally in a way that doesn’t cut across envs. (i.e. in some envs I have a release version of mymodule installed via pip.) Presumably something could be done though.

  6. copy document to repository root to edit. yeah, no. (Actually, my immediate problem is that I’m trying to convert a specific project to this structure and previously had these notebooks at the repository root, but this isn’t sustainable for more than a few documents or for more advanced things like using them with quarto.)

  7. change working directory in the notebook, with code in the first cell. This is by far the most popular solution on stackexchange etc, which is kind of bizarre to me, as this produces notebooks that won’t work or at best have unpredictable consequences in a production setting, and generally have bad reproducibility issues, and is also dependent on jupyter lab being started in the correct place. Pros: not coming up with much here. It works, kind of? Cons: really bad idea. (In fact, seeing this bad advice so pervasively became part of my motivation for working through this whole list in so much detail…)

I think what I wish existed, but can’t find, is some sort of configuration file I could put in the repository root, that would tell jupyter lab when started in that context, to just add the current directory absolutely to PYTHONPATH. But I just can’t figure out a way to do this. What, if anything, am I missing? I have found third party tools that amount to some version of my 1-2. Also, to be clear 1-2 are basically acceptable solutions to me for interactive use, and 1-3 are completely fine solutions for scripted use. So I’m perhaps trying to figure out if the perfect solution exists for interactive use…

Thanks for any help or suggestions if there’s at least a standard approach to this.

don’t want the install step

Unfortunately, installing really does take care of most of the issues, and already has the editable install technique.

This could be encapsulated in a utility file in the relevant locations (e.g.
might need to be duplicate if docs is multi-leveled).

# docs/_ensure_mymodule.py
import sys, subprocess
from pathlib import Path
HERE = Path(__file__).parent
MY_DEV_MODULE = HERE / "../mymodule"
try:
    import mymodule
except ImportError:
    subprocess.check_call(
        [
            sys.executable,
            "-m",
            "pip",
            "install",
            "-e",
            str(MY_DEV_MODULE),
            "--no-deps", 
            "--ignore-installed",
        ]
    )
    import mymodule

… and then in each of the docs/*.ipynb

import _ensure_mymodule
import mymodule

… or run as part of some setup, e.g. binder’s postBuild.

The above script could also call pip check to help ensure that the outer environment has actually provided all the required dependencies.

Anything involving PYTHONPATH, symlinks, jupyter configuration, etc. is likely going to be less portable (windows asks: what’s a symlink) or more flaky.

2 Likes

Not sure if you already saw this, but you can add PYTHONPATH to kernelspec env How can I pass environment variabel PYTHONPATH to jupyter notebook? - #2 by kevin-bates. I like this for 1-person projects and imagine it could also work in small teams if you agree on sharing a kernelspec (with a benefit of having less noise in diffs).

1 Like

Thanks for the responses! I had indeed forgotten about editable installs so that does check off more boxes for solution 3 than in my initial evaluation.

(Although, honest self-evaluation: what may happen here is that I’ll set up something like this in the repository for testing/scripting purposes but then keep using the symlink approach myself out of laziness.)

Not sure if you already saw this, but you can add PYTHONPATH to kernelspec env

Thanks, I should look into this for my package that does distribute a kernel; currently I think I do this by injecting the path via exec_lines (which was quite painful to get to work in a robust way). But the project the prompted the question here is an OSS one where the notebooks provide package documentation, so the hope is that someone who wanted to run them could simply open them as-is with the package installed.

1 Like