A few months ago, I setup JupyterHub running on Google Cloud by following the excellent Z2JH tutorial. My Hub uses a customised Docker container derived from the Jupyter Docker stacks.
Everything is working nicely, but I’ve just upgraded my Docker container to use JupyterLab 1.0 and Python 3.7 (instead of JupyterLab 0.35 and Python 3.6). Now, whenever I run a notebook or restart a kernel, I see files named e.g.
core.ZMQbg!3.xx.xxxxxxx appearing in the JupyterHub file browser. Everything still runs OK, but the file browser quickly becomes messy/cluttered:
This doesn’t happen when running my upgraded container locally via Docker, so I assume it’s something to do with my JupyterHub configuration. I think this problem might be because I need to upgrade my JupyterHub and Helm chart too? I’m currently running version 0.8.0 of the Helm chart and JupyterHub 0.9.4 on Google Cloud, whereas I see that my updated container has JupyterHub 1.0.
The latest version of the Z2JH tutorial uses version 0.9-dev of the Helm chart. However, I’m a bit nervous about upgrading to this, as the changelog doesn’t yet specify what I need to check before upgrading, as recommended here.
Can anyone advise me if:
Upgrading my Helm chart is likely to solve the problem of weird
core.ZMQbg!3 files, and
Upgrading from Helm chart version 0.8.0 to 0.9-dev is likely to be as straightforward as
helm upgrade <YOUR-HELM-RELEASE-NAME> jupyterhub/jupyterhub --version=v0.9-dev -f config.yaml
or whether there are “breaking changes” I need to watch out for?
Thank you very much!
A brief follow-up on this…
This morning I backed-up my users’ data and upgraded the Helm chart to v0.9-61cf357. This worked fine, but it hasn’t solved the problem of the
I guess it’s a good thing that my Hub and container are now running the same versions of JupyterHub, but if anyone has suggestions for what’s causing the
core files, please let me know
The only files I know that have a name like this are files created when a process exits with a “core dump”. https://en.wikipedia.org/wiki/Core_dump
From the filename it seems like it is a process related to 0MQ the networking library used by kernels and frontends to talk to each other.
Why this happens I have no idea unfortunately I’d try and narrow it down to which kernel you are running when this happens and how to reliably trigger it. Then maybe try and find out how to look at the contents of the core files. I don’t know how to but the whole point of them is debugging so there must be information in them
Great points @betatim.
You may also want to check the pod logs corresponding to the Notebook server started by Kubespawner if you haven’t already. There’s a chance the kernel is posting something to
stderr that might provide some clues.
Have you modified the notebook image in any way? If so, I would also check the versions of the kernel (and its dependencies) relative to notebook since an unexpected version of ZMQ (and its bowels) may be getting encountered and some segmentation violation or null pointer exception is occurring (which typically produce core files) due to a version mismatch.
Thanks @betatim. I must admit to being a bit out of my depth here, but the problem seems to be a version conflict with my installations of tornado and/or zeromq and/or pyzmq.
The core dump files are generated whenever I start or restart a Python kernel. Running locally in Docker, I don’t get any dump files generated, but I do see the following message in Powershell from my notebook server:
[I 13:52:59.626 LabApp] Kernel restarted: 54576ebf-e3e1-45e5-8457-4eb48d3f5c89
zmq.eventloop.minitornado is deprecated in pyzmq 14.0 and will be removed.
Install tornado itself to use zmq with the tornado IOLoop.
It seems that on JupyterHub this same error leads to the core dump files being generated. In my Dockerfile, I’m installing packages using Conda and the output of
conda list gives:
Name Version Build Channel
tornado 6.0.3 py37h7b6447c_0 defaults
zeromq 4.2.5 hf484d3e_1 defaults
pyzmq 17.1.2 py37h14c3975_0 defaults
I initially thought this looked OK, since my old container also used tornado version 6. However, I’ve just downgraded my new container to tornado 5.1.1, and the problem seems to have gone away
Thanks @kevin-bates - I think we posted almost simultaneously!
I guess I must have done something to end up with incompatible versions of tornado and zmq/pyzmq, although I’m a bit puzzled as to why I’m now using an older version of tornado than I had before.
I’ll keep looking to try to understand what I did wrong, and I’ll post here if I figure it out. But first I’ll celebrate
Thanks for the tips regarding checking the pod logs - I’m new to debugging on kubernetes, and it takes a bit of getting used to!