Datalad.api works in terminal ipython but not Jupyter lab; how to fix?

Hi,
I’m trying to get datalad.api to work in jupyter lab. It seems to work in the terminal with ipython but not within the notebook. Here’s a link to the site with directions on how to use: http://handbook.datalad.org/en/latest/code_from_chapters/usecase_ml_code.htm

The command that gives me an error in jupyter lab is: import datalad.api as dl

This is the error I get:
/Users/eprzysinda/miniconda3/envs/test/lib/python3.7/site-packages/datalad/cmd.py:375: RuntimeWarning: coroutine ‘run_async_cmd’ was never awaited
new_loop = True
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

RuntimeError Traceback (most recent call last)
in
----> 1 import datalad.api as dl

~/miniconda3/envs/test/lib/python3.7/site-packages/datalad/init.py in
46
47 from .config import ConfigManager
—> 48 cfg = ConfigManager()
49
50 from .log import lgr

~/miniconda3/envs/test/lib/python3.7/site-packages/datalad/config.py in init(self, dataset, overrides, source)
344 self._runner = GitWitlessRunner(**run_kwargs)
345
→ 346 self.reload(force=True)
347
348 if not ConfigManager._checked_git_identity:

~/miniconda3/envs/test/lib/python3.7/site-packages/datalad/config.py in reload(self, force)
397 while to_run:
398 store_id, runargs = to_run.popitem()
→ 399 self._stores[store_id] = self._reload(runargs)
400
401 # always update the merged representation, even if we did not reload

~/miniconda3/envs/test/lib/python3.7/site-packages/datalad/config.py in _reload(self, run_args)
429 protocol=StdOutErrCapture,
430 # always expect git-config to output utf-8
→ 431 encoding=‘utf-8’,
432 )
433 store = {}

~/miniconda3/envs/test/lib/python3.7/site-packages/datalad/config.py in _run(self, args, where, reload, **kwargs)
787 if ‘-l’ in args:
788 # we are just reading, no need to reload, no need to lock
→ 789 out = self._runner.run(self._config_cmd + args, **kwargs)
790 return out[‘stdout’], out[‘stderr’]
791

~/miniconda3/envs/test/lib/python3.7/site-packages/datalad/cmd.py in run(self, cmd, protocol, stdin, cwd, env, **kwargs)
385 protocol_kwargs=kwargs,
386 cwd=cwd,
→ 387 env=env,
388 )
389 )

~/miniconda3/envs/test/lib/python3.7/asyncio/base_events.py in run_until_complete(self, future)
561 “”"
562 self._check_closed()
→ 563 self._check_runnung()
564
565 new_task = not futures.isfuture(future)

~/miniconda3/envs/test/lib/python3.7/asyncio/base_events.py in _check_runnung(self)
524 if events._get_running_loop() is not None:
525 raise RuntimeError(
→ 526 ‘Cannot run the event loop while another loop is running’)
527
528 def run_forever(self):

RuntimeError: Cannot run the event loop while another loop is running

I’m new to jupyter lab and python, so it’s possible I missed something simple when setting things up. Any help is much appreciated.
Thanks!
~Emily

Nope, it is nothing you did or didn’t do. There’s a small incompatibility between some aspects of the current packages. Plus, it isn’t specifically JupyterLab; the related issue arises with the classic notebook interface.
Fortunately, an easy work-around can be found by going to the GitHub repo and searching the terrn ‘loop’ in the issues page for the project, among the open issues. (I extracted that keyword from the RuntimeError at the end of the traceback you report as a hook to start with. It came up with something that looked relevant easily.)

The post here spells out the basics of the work around for this issue. The approach relies on importing nest_asyncio and saying you want to apply use of it. It was added to the documentation here as section ’ 9.4.5.1. asyncio errors at DataLad import’ under ’ Common warnings and error’. I’ll spell out implementing it in steps that will work for novices below.


Work-around

  1. Open your notebook and add the following code in cell above your normal cells.
%pip install nest_asyncio
  1. Run that cell and then restart the kernel.

  2. Then add another cell towards the top of your notebook and run the following code:

import nest_asyncio
nest_asyncio.apply()

import datalad.api as dl
  1. You can choose to save this modified notebook because re-running any of these again is fine.

Note: Once you do the first %pip install step in your environment, other notebooks should work without requiring that %pip install step as long as you add the import nest_asyncio; nest_asyncio.apply() lines at the top of your other notebooks before you import datalad.


Note this issue was cross-posted at StackOverflow as well here, and I’ve posted a link to this solution there.

1 Like

Thank you so much for this solution. This allowed for the import and it seems to be working for some datalad code, except for I’m experiencing an error when trying to use git-annex. I’m on a mac so it should have been installed with Datalad according to the instructions: 3. Installation and configuration — The DataLad Handbook

It seems to be a similar situation where it works in the terminal but not in Jupyter lab and gives an error that git-annex of version >= 7.20190503 is missing even though it was installed in the terminal environment (git-annex version 8.20210428, conda 4.9.2, python 3.7). Do you think this is an issue that could be solved in a similar way? If not, I’ll make a separate post for this issue. This was also a problem before I fixed the data lad also posted in stack overflow (see above).

Here are the details of this error:
results = ds.status(annex=‘all’); yields

MissingExternalDependency Traceback (most recent call last)
in
----> 1 results = ds.status(annex=‘all’)

~/miniconda3/envs/test/lib/python3.7/site-packages/datalad/distribution/dataset.py in apply_func(wrapped, instance, args, kwargs)
501 elif i >= ds_index:
502 kwargs[orig_pos[i+1]] = args[i]
→ 503 return f(**kwargs)
504
505 setattr(Dataset, name, apply_func(f))

~/miniconda3/envs/test/lib/python3.7/site-packages/datalad/interface/utils.py in eval_func(wrapped, instance, args, kwargs)
484 return results
485 lgr.log(2, “Returning return_func from eval_func for %s”, wrapped_class)
→ 486 return return_func(generator_func)(*args, **kwargs)
487
488 return eval_func(func)

~/miniconda3/envs/test/lib/python3.7/site-packages/datalad/interface/utils.py in return_func(wrapped_, instance_, args_, kwargs_)
472 # unwind generator if there is one, this actually runs
473 # any processing
→ 474 results = list(results)
475 # render summaries
476 if not result_xfm and result_renderer in (‘tailored’, ‘default’):

~/miniconda3/envs/test/lib/python3.7/site-packages/datalad/interface/utils.py in generator_func(*_args, **_kwargs)
403 result_log_level,
404 # let renderers get to see how a command was called
→ 405 allkwargs):
406 for hook, spec in hooks.items():
407 # run the hooks before we yield the result

~/miniconda3/envs/test/lib/python3.7/site-packages/datalad/interface/utils.py in _process_results(results, cmd_class, on_failure, action_summary, incomplete_results, result_renderer, result_log_level, allkwargs)
559 render_n_repetitions = 10 if sys.stdout.isatty() else float(“inf”)
560
→ 561 for res in results:
562 if not res or ‘action’ not in res:
563 # XXX Yarik has to no clue on how to track the origin of the

~/miniconda3/envs/test/lib/python3.7/site-packages/datalad/core/local/status.py in call(path, dataset, annex, untracked, recursive, recursion_limit, eval_subdataset_state, report_filetype)
426 eval_subdataset_state,
427 report_filetype == ‘eval’,
→ 428 content_info_cache):
429 yield dict(
430 r,

~/miniconda3/envs/test/lib/python3.7/site-packages/datalad/core/local/status.py in _yield_status(ds, paths, annexinfo, untracked, recursion_limit, queried, eval_submodule_state, eval_filetype, cache)
128 init=status,
129 eval_availability=annexinfo in (‘availability’, ‘all’),
→ 130 ref=None)
131 for path, props in status.items():
132 cpath = ds.pathobj / path.relative_to(repo_path)

~/miniconda3/envs/test/lib/python3.7/site-packages/datalad/support/annexrepo.py in get_content_annexinfo(self, paths, init, ref, eval_availability, key_prefix, **kwargs)
3215 cmd += [’–include’, ‘*’]
3216
→ 3217 for j in self.call_annex_records(cmd, files=files):
3218 path = self.pathobj.joinpath(ut.PurePosixPath(j[‘file’]))
3219 rec = info.get(path, None)

~/miniconda3/envs/test/lib/python3.7/site-packages/datalad/support/annexrepo.py in call_annex_records(self, args, files)
1204 See _call_annex() for more information on Exceptions.
1205 “”"
→ 1206 return self._call_annex_records(args, files=files)
1207
1208 def call_annex(self, args, files=None):

~/miniconda3/envs/test/lib/python3.7/site-packages/datalad/support/annexrepo.py in _call_annex_records(self, args, files, jobs, git_options, stdin, merge_annex_branches, progress, **kwargs)
1085 stdin=stdin,
1086 merge_annex_branches=merge_annex_branches,
→ 1087 **kwargs,
1088 )
1089 except CommandError as e:

~/miniconda3/envs/test/lib/python3.7/site-packages/datalad/support/annexrepo.py in _call_annex(self, args, files, jobs, protocol, git_options, stdin, merge_annex_branches, **kwargs)
930 “”"
931 if self.git_annex_version is None:
→ 932 self._check_git_annex_version()
933
934 # git portion of the command

~/miniconda3/envs/test/lib/python3.7/site-packages/datalad/support/annexrepo.py in _check_git_annex_version(cls)
553 )
554 if not ver:
→ 555 raise MissingExternalDependency(**exc_kwargs)
556 elif ver < cls.GIT_ANNEX_MIN_VERSION:
557 raise OutdatedExternalDependency(ver_present=ver, **exc_kwargs)

MissingExternalDependency: git-annex of version >= 7.20190503 is missing. Visit http://handbook.datalad.org/r.html?install for instructions on how to install DataLad and git-annex.

I’m hoping you can add another cell after step #1 from above and then run the following code before you do the imports:

%conda install -c conda-forge git-annex

Be sure to restart the kernel after running that. This may work because it is found at conda.

I am unable to test this though myself because you don’t provide enough code that could be used to test this in an independent environment.

Unfortunately not. When I try this it says I can’t find the channels:
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  • git-annex

Current channels:

  • conda.anaconda.org/conda-forge/osx-64

  • conda.anaconda. org/ /conda-forge/noarch

  • repo.anaconda. com/pkgs/main/osx-64

  • repo.anaconda. com/pkgs/main/noarch

  • repo.anaconda. com/pkgs/r/osx-64

  • repo.anaconda. com/pkgs/r/noarch
    (I put a space between .com/.org because it wouldn’t let me post so many links)
    To search for alternate channels that may provide the conda package you’re
    looking for, navigate to

    https://anaconda.org

and use the search bar at the top of the page.

I tried some of the other channels like osx64 and noarch, but that didn’t work either.

When I type in conda config --show channels it says “defaults”

Any thoughts on that? What additional information would you need to test it?

I forgot that you are trying it on a Mac. That %conda install command I suggested above won’t work for you then because on the page I linked to there isn’t a MacOS icon listed there under ‘Installers’, like it does here.

I cannot test what you need anyway because of the mac issue anyway. I’d be testing on a linux system.

You somehow need to point the environment running the notebook at where you installed git-annex on your computer. Usually a symbolic link is enough.
From what you have posted about stuff run in your notebooks, it looks like it is looking for stuff in ~/miniconda3/envs/test/lib/python3.7/site-packages/datalad/, but I suspect that was installed by pip. I don’t know though where the conda stuff would be placed in your file hierarchy. If you can determine where conda install stuff gets installed, you’d want to add to there the symbolic link from your local gut-annex install that you say works in the terminal. (I wonder for debugging if you need to think more general here and look for examples of ‘how to link Homebrew installations to Jupyter installs on a Mac’ or something like that, assuming you used Homebrew to install git-annex onto your Mac.)

If you have directions that said you could do this in a Jupyter notebook on a Mac that would be helpful to post. The Datalad documentation I looked at all talked about it being a command line program.