How to extend default timeout for kernel launching?

TacticalChipi · November 24, 2023, 2:50pm

Hello,

I am currently facing an issue that cannot be replicated consistently, sometimes it happens some other times it does not.

I am issuing the following command on bash:
jupyter nbconvert --execute --to notebook --inplace $file_path

The result should simply be the target notebook running in place but most of the time the kernel does not have a chance to launch due to a timeout issue, stacktrace below:

[NbConvertApp] Converting notebook src/utils/output_reporter.ipynb to notebook
[NbConvertApp] ERROR | Error occurred while starting new kernel client for kernel 6d97e36a-1568-404d-b1c4-a1dfbb693381: Kernel didn't respond in 60 seconds
Traceback (most recent call last):
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/bin/jupyter-nbconvert", line 10, in <module>
    sys.exit(main())
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/jupyter_core/application.py", line 280, in launch_instance
    super().launch_instance(argv=argv, **kwargs)
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/traitlets/config/application.py", line 1043, in launch_instance
    app.start()
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/nbconvertapp.py", line 412, in start
    self.convert_notebooks()
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/nbconvertapp.py", line 590, in convert_notebooks
    self.convert_single_notebook(notebook_filename)
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/nbconvertapp.py", line 556, in convert_single_notebook
    output, resources = self.export_single_notebook(
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/nbconvertapp.py", line 479, in export_single_notebook
    output, resources = self.exporter.from_filename(
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/exporters/exporter.py", line 201, in from_filename
    return self.from_file(f, resources=resources, **kw)
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/exporters/exporter.py", line 220, in from_file
    return self.from_notebook_node(
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/exporters/notebook.py", line 36, in from_notebook_node
    nb_copy, resources = super().from_notebook_node(nb, resources, **kw)
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/exporters/exporter.py", line 154, in from_notebook_node
    nb_copy, resources = self._preprocess(nb_copy, resources)
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/exporters/exporter.py", line 352, in _preprocess
    nbc, resc = preprocessor(nbc, resc)
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/preprocessors/base.py", line 48, in __call__
    return self.preprocess(nb, resources)
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/preprocessors/execute.py", line 96, in preprocess
    with self.setup_kernel():
  File "/anaconda/envs/azureml_py38/lib/python3.8/contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbclient/client.py", line 603, in setup_kernel
    self.start_new_kernel_client()
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/jupyter_core/utils/__init__.py", line 173, in wrapped
    return loop.run_until_complete(inner)
  File "/anaconda/envs/azureml_py38/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbclient/client.py", line 566, in async_start_new_kernel_client
    await ensure_async(self.kc.wait_for_ready(timeout=self.startup_timeout))
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/jupyter_core/utils/__init__.py", line 189, in ensure_async
    result = await obj
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/jupyter_client/client.py", line 207, in _async_wait_for_ready
    raise RuntimeError("Kernel didn't respond in %d seconds" % timeout)
RuntimeError: Kernel didn't respond in 60 seconds

I can see how when I manually open the jupyter notebook (instead of trying to run it programmatically in bash) it does indeed typically take unusually long to launch the kernel but eventually it will launch and the notebook will manually run to completion and without any exceptions being raised. So I am reasonably confident there is no inherent bug or dependency problem here.

I operate on a compute instance spun by Azure ML studio on a corporate cloud environment and, as a matter of fact, network connectivity does not seem to be the best with terminals intermittently disconnecting and reconnecting again.

1. Can you help me to know a bit more about the RuntimeError: Kernel didn't respond in 60 seconds and what can be done to extend this timeout period ?
2. Is there any indirect solution where I can help the kernel to launch? Any ideas based on stacktrace above where is wait time coming from? Could it be just bad network connectivity?

Your support / discussion is much appreciated as always.

Topic		Replies	Views
Can anyone advise how to get nbconvert to work with a notebook that uses a bash kernel? nbconvert help-wanted	3	2247	August 20, 2022
JupyterJupyterhub and Enterprise Gateway error starting kernel JupyterHub	1	527	December 10, 2023
Automatically running jupyter notebooks and persisting memory JupyterHub	2	1091	October 2, 2023
`before-notebook.d` hook causes server timeout JupyterHub jupyterlab , jupyterhub , help-wanted	2	642	June 18, 2022
Nbconvert 6.0.7 is including 'Press Ctrl-C again to quit kernel.' when passed --stdout nbconvert	0	417	June 2, 2021

How to extend default timeout for kernel launching?

Related topics