The following seems like an extremely basic question, but I’ve been kind of tearing my hair out in trying to find what an ideal answer to it, or even if there’s a standard answer, and my search terms have come to very little so far (I hope I am not missing something completely obvious). The question is, essentially: what is the best way to work with notebook working directories in the context of a repository? I have a pretty standard repository structure where the essential part looks like the following picture. Within docs
I have some jupyter notebooks that run import mymodule
:
repository
\- mymodule
\- docs
\- overview.ipynb
\- manual.ipynb
\- examples.ipynb
\- ...
The problem is simple: for development purposes, what is the best or standard way to get that import
to work? When jupyter lab
opens a notebook, it sets the working directory for python path purposes to that notebook’s directory, so the import
will fail in this structure without something else. Here’s what I’ve come with from searching and brainstorming, approximately from best to worst:
-
symlink
mymodule
into thedocs
directory and add it to.gitignore
. Creation of the link could be automated for new repository users. I’ve done this before and I suspect will do it again. Pros: one-time setup that can be scripted, relatively general, completely specific to a repository structure (as opposed to env or per-user config). Cons: new repository instantiation will fail to run the relevant notebooks in-repo without an extra step; swapping to using an installedmymodule
for any reason requires deleting symlink or moving the notebooks; more would be needed for any subdirectories as well. -
run
jupyter lab
with an explicitPYTHONPATH
that includes the dev location ofmymodule
. Pros: Completely general to whatever structure is indocs
, scriptable. Cons: Not a one-time setup I will forget to do this 100% of the time and as far as I know that requires restarting lab to fix (committing a script to the repo would help mitigate). Probably for most people and cases the best general solution though. -
work within a venv of some sort and reinstall
mymodule
every time it changes. Pros: Completely general to whatever structure is indocs
, scriptable. Cons: very heavyweight, easy to accidentally overwrite installed package in a non-dev env, requires jupyter lab to already be running in the correct env (at least using conda envs, which I am), annoying barrier to dev-test loop. Possible variant: set env-specific config somehow? (I do use envs, just don’t want the install step.) -
create a kernel with the relevant path built in. I have actually done something like this before for the case where I’m distributing a non-trivial kernel (so the kernel name corresponds to something that gets installed with the package), but I don’t think this makes sense for the general case because the notebooks are committed, and the committed versions need to run with the regular ipython kernel. Could mess with metadata pre-commit.
-
something with
~/.jupyter
: haven’t really explored this because I want something that makes the repository work independent of my local setup, and ideally in a way that doesn’t cut across envs. (i.e. in some envs I have a release version ofmymodule
installed via pip.) Presumably something could be done though. -
copy document to repository root to edit. yeah, no. (Actually, my immediate problem is that I’m trying to convert a specific project to this structure and previously had these notebooks at the repository root, but this isn’t sustainable for more than a few documents or for more advanced things like using them with quarto.)
-
change working directory in the notebook, with code in the first cell. This is by far the most popular solution on stackexchange etc, which is kind of bizarre to me, as this produces notebooks that won’t work or at best have unpredictable consequences in a production setting, and generally have bad reproducibility issues, and is also dependent on jupyter lab being started in the correct place. Pros: not coming up with much here. It works, kind of? Cons: really bad idea. (In fact, seeing this bad advice so pervasively became part of my motivation for working through this whole list in so much detail…)
I think what I wish existed, but can’t find, is some sort of configuration file I could put in the repository root, that would tell jupyter lab when started in that context, to just add the current directory absolutely to PYTHONPATH
. But I just can’t figure out a way to do this. What, if anything, am I missing? I have found third party tools that amount to some version of my 1-2. Also, to be clear 1-2 are basically acceptable solutions to me for interactive use, and 1-3 are completely fine solutions for scripted use. So I’m perhaps trying to figure out if the perfect solution exists for interactive use…
Thanks for any help or suggestions if there’s at least a standard approach to this.