Identifying cause of issues saving

Hi folks. We serve JupyterLab on JupyterHub and have done so without too many problems for the last couple of years, but since upgrading have encountered issues with files manually and automatically saving.

The behaviour, as far as I can tell, only seems to impact users that used JupyterLab prior to the upgrade, but first time users also tend to have quite light usage so may not be encountering the same trigger. Going on existing users being impacted, this suggests that some sort of cache or config might be the cause of the problem.

What we’ve seen is that on opening a file, the user can save once without an issue, but after that, manual and automatic saving fails to work. The user’s session is still valid as they’re able to open a new file and work with that and save, but not the existing file. This results in work being lost as the user hasn’t noticed the file modification date failing to update. Some users have reported that they’re unable to export the non-saving files to PDF (or anything else I’m guessing), which might suggest some sort of loss of connection to the impacted kernel(s) if multiple actions are impacted.

When looking at the logs, which I’ve enabled debugging, I can see the entry for “Saving …” no long appears for the file after initially saving, but there’s nothing else server side to suggest there’s a problem.

As work is quite critical, I haven’t had the opportunity to sit with someone that’s affected to try and resolve, so to attempt a quick resolution, I’m clearing out a bunch of cache and configuration directories mainly tied to Jupyter:

~/.cache/{jedi,matplitlib,yarn}
~/.config/matplotlib
~/.local/share/jupyter
~/.ipython
~/.jupyter
~/.virtual_documents

Having the users log back in afterwards seems to resolve the issue, but I’m not entirely comfortable with that being the solution without knowing why exactly.

The user accounts are all pre-existing, not managed by Jupyter, and homes are mounted over NFS shares. We’ve tried to identify if NFS might be the problem, but haven’t encountered anything at that level, and with only specific users being impacted, points towards this not being a file share issue.

The environment currently looks like:

jupyter-client                    7.3.4
jupyter-core                      4.11.1
jupyter-lsp                       1.5.1
jupyter-server                    1.18.1
jupyter-server-mathjax            0.2.6
jupyter-telemetry                 0.1.0
jupyterhub                        2.3.1
jupyterhub-idle-culler            1.2.2.dev1
jupyterhub-systemdspawner         0.16
jupyterlab                        3.4.8
jupyterlab-git                    0.37.1
jupyterlab-link-share             0.2.4
jupyterlab-lsp                    3.10.1
jupyterlab-pygments               0.2.2
jupyterlab-server                 2.15.0

Previously we had something closer to the below, but I don’t have the exact versions prior to the upgrade, just those that were noted at initial install:

jupyter                1.0.0
jupyter-client         7.1.2
jupyter-console        6.4.0
jupyter-core           4.9.1
jupyter-lsp            0.9.1
jupyter-telemetry      0.1.0
jupyterhub             1.1.0
jupyterhub-idle-culler 1.2.2.dev1
jupyterlab             2.2.10
jupyterlab-git         0.20.0
jupyterlab-pygments    0.1.2
jupyterlab-server      1.2.0
jupyterlab-widgets     1.0.2

We’ve switched from sudospawner to systemdspawner to implement resource restrictions on the shared infrastructure. As far as I can tell, this shouldn’t cause a problem with saving, but the kernel(s) hitting the quota limit will restart (I annoyingly can’t find the documentation listing this at the moment, but have seen that quoted).

I’ve not been able to find similar cases of this - is anyone able to advise what this could be or the best steps to follow to identify what the actual cause is?

Thanks!