Hello,
I am currently facing an issue that cannot be replicated consistently, sometimes it happens some other times it does not.
I am issuing the following command on bash:
jupyter nbconvert --execute --to notebook --inplace $file_path
The result should simply be the target notebook running in place but most of the time the kernel does not have a chance to launch due to a timeout issue, stacktrace below:
[NbConvertApp] Converting notebook src/utils/output_reporter.ipynb to notebook
[NbConvertApp] ERROR | Error occurred while starting new kernel client for kernel 6d97e36a-1568-404d-b1c4-a1dfbb693381: Kernel didn't respond in 60 seconds
Traceback (most recent call last):
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/bin/jupyter-nbconvert", line 10, in <module>
sys.exit(main())
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/jupyter_core/application.py", line 280, in launch_instance
super().launch_instance(argv=argv, **kwargs)
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/traitlets/config/application.py", line 1043, in launch_instance
app.start()
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/nbconvertapp.py", line 412, in start
self.convert_notebooks()
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/nbconvertapp.py", line 590, in convert_notebooks
self.convert_single_notebook(notebook_filename)
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/nbconvertapp.py", line 556, in convert_single_notebook
output, resources = self.export_single_notebook(
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/nbconvertapp.py", line 479, in export_single_notebook
output, resources = self.exporter.from_filename(
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/exporters/exporter.py", line 201, in from_filename
return self.from_file(f, resources=resources, **kw)
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/exporters/exporter.py", line 220, in from_file
return self.from_notebook_node(
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/exporters/notebook.py", line 36, in from_notebook_node
nb_copy, resources = super().from_notebook_node(nb, resources, **kw)
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/exporters/exporter.py", line 154, in from_notebook_node
nb_copy, resources = self._preprocess(nb_copy, resources)
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/exporters/exporter.py", line 352, in _preprocess
nbc, resc = preprocessor(nbc, resc)
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/preprocessors/base.py", line 48, in __call__
return self.preprocess(nb, resources)
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbconvert/preprocessors/execute.py", line 96, in preprocess
with self.setup_kernel():
File "/anaconda/envs/azureml_py38/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbclient/client.py", line 603, in setup_kernel
self.start_new_kernel_client()
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/jupyter_core/utils/__init__.py", line 173, in wrapped
return loop.run_until_complete(inner)
File "/anaconda/envs/azureml_py38/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/nbclient/client.py", line 566, in async_start_new_kernel_client
await ensure_async(self.kc.wait_for_ready(timeout=self.startup_timeout))
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/jupyter_core/utils/__init__.py", line 189, in ensure_async
result = await obj
File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/st9jrm-cpu/code/Users/jose.ruiz/azureml_predictive_modelling_pipeline/lib/python3.8/site-packages/jupyter_client/client.py", line 207, in _async_wait_for_ready
raise RuntimeError("Kernel didn't respond in %d seconds" % timeout)
RuntimeError: Kernel didn't respond in 60 seconds
I can see how when I manually open the jupyter notebook (instead of trying to run it programmatically in bash) it does indeed typically take unusually long to launch the kernel but eventually it will launch and the notebook will manually run to completion and without any exceptions being raised. So I am reasonably confident there is no inherent bug or dependency problem here.
I operate on a compute instance spun by Azure ML studio on a corporate cloud environment and, as a matter of fact, network connectivity does not seem to be the best with terminals intermittently disconnecting and reconnecting again.
1. Can you help me to know a bit more about the RuntimeError: Kernel didn't respond in 60 seconds
and what can be done to extend this timeout period ?
2. Is there any indirect solution where I can help the kernel to launch? Any ideas based on stacktrace above where is wait time coming from? Could it be just bad network connectivity?
Your support / discussion is much appreciated as always.