Notebook failed to connect with enterprice gateway kernel using websocket. Exception in KernelGatewayWSClient._connection_done

#1

I am working jupyter enterprise gateway (jupyter_enterprise_gateway_kernelspecs-2.0.0rc1.tar.gz). I have setup jupyter enterprise gateway and “spark_python_yarn_client kernel” in same m/c. I have setup Notebook and Spark in other system. I am able to connect enterprise gateway from remote Notebook, and Kernel also able to connect with Spark and generate application ID which is showing running in Spark UI. But getting bellow error log in Notebook.

[I 13:21:00.410 NotebookApp] Creating new notebook in
[I 13:21:08.021 NotebookApp] Request start kernel: kernel_id=None, path=""
[I 13:21:08.135 NotebookApp] Kernel started: 9fd2feb9-f9df-48a9-a775-f6c82044c1a6 [W 13:21:08.153 NotebookApp] Kernelspec resource ‘logo-64x64.png’ for ‘spark_python_yarn_client’ not found. Gateway may not support resource serving.
[I 13:21:08.614 NotebookApp] Connecting to ws://cto:8888/api/kernels/9fd2feb9-f9df-48a9-a775-f6c82044c1a6/channels Exception in callback KernelGatewayWSClient._connection_done() handle: )> Traceback (most recent call last): File “/home/arindam/anaconda3/lib/python3.6/asyncio/events.py”, line 145, in _run self._callback(*self._args)
File “/home/arindam/anaconda3/lib/python3.6/site-packages/nb2kg/handlers.py”, line 173, in _connection_done self.ws = fut.result()
tornado.simple_httpclient.HTTPTimeoutError: Timeout during request
[E 13:21:28.658 NotebookApp] Exception reading message from websocket: ‘NoneType’ object has no attribute ‘read_message’
[E 13:21:28.659 NotebookApp] Exception writing message to websocket: ‘NoneType’ object has no attribute ‘write_message’
[E 13:21:28.660 NotebookApp] Exception writing message to websocket: ‘NoneType’ object has no attribute ‘write_message’
[E 13:22:41.512 NotebookApp] Exception writing message to websocket: ‘NoneType’ object has no attribute ‘write_message’ [E 13:22:59.856 NotebookApp] Exception writing message to websocket: ‘NoneType’ object has no attribute ‘write_message’

Could anyone help on this.

#2

Thank you for opening the discussion.

I suspect the request got a timeout despite the kernel’s successful startup. Could you please provide the complete EG log (or console output) that correlates to the request? We can then determine if there are server-side issues occurring.

FWIW, we find yarn cluster mode easier to use (and better in that it lets the YARN Resource Manager determine where the kernel will run) since it doesn’t require the configuration of password-less SSH across the YARN cluster and duplication of the kernelspec hierarchy. Nevertheless, let’s take a look at your case that uses DistributedProcessProxy to ssh to various nodes.

If this is indeed a timeout issue, the env variable KG_REQUEST_TIMEOUT should be increased to something more like 40 or 60 seconds. The default is 20 seconds, but some configurations take longer to service the initial kernel startup. In fact, I created an NB2KG PR yesterday to increase the defaults.