User pages suddenly returning 404

All of a sudden I can’t access any of the /user/ pages on my TLJH. The only thing I changed was try to install jupyter-ai. I uninstalled it but the problem persists. I also restarted the server.

Files are still in place, everything else seems to be working fine (hub admin interface, create user functionality).

I’d really appreciate any help on how to debug/fix this issue.
Thanks a lot!

source /opt/tljh/user/bin/activate

pip freeze 

advertools==0.14.2
adviz==0.0.15
aiofiles==22.1.0
aiohttp==3.9.5
aiosignal==1.3.1
aiosqlite==0.19.0
alembic==1.11.1
annotated-types==0.7.0
ansi2html==1.8.0
anyio==3.7.0
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
arrow==1.2.3
asttokens==2.2.1
async-generator==1.10
async-lru==2.0.4
async-timeout==4.0.3
attrs==23.1.0
Automat==22.10.0
Babel==2.12.1
backcall==0.2.0
beautifulsoup4==4.12.2
bleach==6.0.0
brotlipy==0.7.0
certifi==2023.5.7
certipy==0.1.3
cffi @ file:///home/conda/feedstock_root/build_artifacts/cffi_1631636250774/work
chardet @ file:///home/conda/feedstock_root/build_artifacts/chardet_1610093492116/work
charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1626371162869/work
click==8.1.3
cloudpickle==3.0.0
colorama @ file:///home/conda/feedstock_root/build_artifacts/colorama_1602866480661/work
comm==0.1.3
conda==4.10.3
conda-package-handling @ file:///home/conda/feedstock_root/build_artifacts/conda-package-handling_1618231390031/work
constantly==15.1.0
contourpy==1.1.0
cryptography @ file:///home/conda/feedstock_root/build_artifacts/cryptography_1633983255347/work
cssselect==1.2.0
cycler==0.11.0
dash==2.11.0
dash-bootstrap-components==1.4.1
dash-bootstrap-templates==1.1.2
dash-core-components==2.0.0
dash-html-components==2.0.0
dash-table==5.0.0
dask==2024.5.2
dataclasses-json==0.6.6
debugpy==1.6.7
decorator==5.1.1
deepmerge==1.1.1
defusedxml==0.7.1
distributed==2024.5.2
entrypoints==0.4
exceptiongroup==1.1.1
executing==1.2.0
faiss-cpu==1.8.0
fastjsonschema==2.17.1
filelock==3.12.2
Flask==2.2.5
fonttools==4.40.0
fqdn==1.5.1
frozenlist==1.4.1
fsspec==2024.6.0
greenlet==2.0.2
h11==0.14.0
httpcore==1.0.5
httpx==0.27.0
hyperlink==21.0.0
idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1609836280497/work
importlib-metadata==6.7.0
importlib-resources==5.12.0
incremental==22.10.0
ipykernel==6.23.2
ipython==8.14.0
ipython-genutils==0.2.0
ipywidgets==7.7.5
isoduration==20.11.0
itemadapter==0.8.0
itemloaders==1.1.0
itsdangerous==2.1.2
jedi==0.18.2
Jinja2==3.1.2
jmespath==1.0.1
json5==0.9.14
jsonpatch==1.33
jsonpath-ng==1.6.1
jsonpointer==2.4
jsonschema==4.22.0
jsonschema-specifications==2023.12.1
jupyter-events==0.10.0
jupyter-lsp==2.2.5
jupyter-resource-usage==0.6.4
jupyter-telemetry==0.1.0
jupyter-ydoc==0.2.4
jupyter_client==8.2.0
jupyter_core==5.3.1
jupyter_server==2.14.1
jupyter_server_fileid==0.9.0
jupyter_server_terminals==0.5.3
jupyter_server_ydoc==0.8.0
jupyterhub==1.5.1
jupyterlab==4.2.1
jupyterlab-pygments==0.2.2
jupyterlab-widgets==1.1.4
jupyterlab_server==2.27.2
kaleido==0.2.1
kiwisolver==1.4.4
langchain==0.1.20
langchain-community==0.0.38
langchain-core==0.1.52
langchain-text-splitters==0.0.2
langsmith==0.1.75
locket==1.0.0
lxml==4.9.2
Mako==1.2.4
mamba @ file:///home/conda/feedstock_root/build_artifacts/mamba_1632770295204/work
MarkupSafe==2.1.3
marshmallow==3.21.3
matplotlib==3.7.1
matplotlib-inline==0.1.6
mistune==3.0.1
msgpack==1.0.8
multidict==6.0.5
mypy-extensions==1.0.0
nbclassic==1.0.0
nbclient==0.8.0
nbconvert==7.6.0
nbformat==5.9.0
nbgitpuller==1.1.1
nest-asyncio==1.5.6
notebook==6.5.4
notebook_shim==0.2.3
nteract-on-jupyter==2.1.3
numpy==1.25.0
oauthlib==3.2.2
orjson==3.10.3
overrides==7.7.0
packaging==23.2
pamela==1.1.0
pandas==2.2.1
pandocfilters==1.5.0
parsel==1.8.1
parso==0.8.3
partd==1.4.2
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.5.0
platformdirs==3.6.0
plotly==5.19.0
ply==3.11
prometheus-client==0.17.0
prompt-toolkit==3.0.38
Protego==0.2.1
psutil==5.9.5
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==12.0.1
pyasn1==0.5.0
pyasn1-modules==0.3.0
pycosat @ file:///home/conda/feedstock_root/build_artifacts/pycosat_1610094799048/work
pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1593275161868/work
pydantic==2.7.3
pydantic_core==2.18.4
PyDispatcher==2.0.7
Pygments==2.15.1
pyOpenSSL @ file:///home/conda/feedstock_root/build_artifacts/pyopenssl_1633192417276/work
pyparsing==3.1.0
pyrsistent==0.19.3
PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1610291451001/work
python-dateutil==2.8.2
python-json-logger==2.0.7
pytz==2023.3
PyYAML==6.0
pyzmq==25.1.0
queuelib==1.6.2
referencing==0.35.1
requests==2.31.0
requests-file==1.5.1
requests-oauthlib==1.3.1
retrying==1.3.4
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.18.1
ruamel-yaml-conda @ file:///home/conda/feedstock_root/build_artifacts/ruamel_yaml_1611943432947/work
ruamel.yaml==0.17.32
ruamel.yaml.clib==0.2.7
scipy==1.11.4
Scrapy==2.9.0
Send2Trash==1.8.2
service-identity==23.1.0
six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work
sniffio==1.3.0
sortedcontainers==2.4.0
soupsieve==2.4.1
SQLAlchemy==1.4.48
stack-data==0.6.2
tblib==3.0.0
tenacity==8.2.2
terminado==0.17.1
tinycss2==1.2.1
tldextract==3.4.4
tomli==2.0.1
toolz==0.12.1
tornado==6.3.2
tqdm @ file:///home/conda/feedstock_root/build_artifacts/tqdm_1632160078689/work
traitlets==5.9.0
Twisted==22.10.0
twython==3.9.1
typing-inspect==0.9.0
typing_extensions==4.6.3
tzdata==2023.3
uri-template==1.2.0
urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1632350318291/work
w3lib==2.1.1
wcwidth==0.2.6
webcolors==1.13
webencodings==0.5.1
websocket-client==1.8.0
Werkzeug==2.2.3
widgetsnbextension==3.6.4
y-py==0.5.9
yarl==1.9.4
ypy-websocket==0.8.2
zict==3.0.0
zipp==3.15.0
zope.interface==6.0

I think installing jupyter-ai will have upgraded some of your other dependencies, leading to some incompatible versions of components. These wouldn’t won’t have been downgraded when you uninstalled jupyter-ai.

For example, jupyterhub==1.5.1 is a very old package (probably dating to when you originally installed TLJH), but other packages are much more recent. You could try recreating your user environment, though given that JupyterHub 1.5.1 is no longer supported it may be better to upgrade everything.

2 Likes

Thanks a lot @manics !

Yes, it was installed last year before JHub v4 was supported.

Would you suggest I use the installer script again for the upgrade, or uninstall, and re-install all packages in the current environment would be safer?
Just don’t want to make another mess.
Thanks again for your help.

I don’t think there’s any harm in running the installer script again. The script should attempt to upgrade the core components in the user environment.

If that doesn’t work, or if you start running into other problems when using JupyterLab/Notebook, it might be worth recreating your user environment. If you delete or rename it (mv /opt/tljh/user /opt/tljh/user.old) and run the tljh script again this will create a new user environment.

Thanks again @manics

It seems to work fine (moving /user/ to /user.old/), but I’m having issues with the ssl certificate (HSTS).

I created the certificate again using tljh-config but it’s still not working.

Any idea what I might have done wrong or what the best practice is for making sure the certificate work?

This does not show the location of the certificate as is shown in the docs

All unique error message from the traefik logs since I made the change:

msg="Error occurred during watcher callback: toml: cannot load TOML value of type map[string]interface {} into a Go slice"
msg="Error while Peeking first byte: read tcp 172.104.249.145:443->5.52.111.82:40817: read: connection timed out"
msg="Error while Peeking first byte: read tcp 172.104.249.145:443->79.175.138.68:49994: read: connection timed out"
msg="Error while Peeking first byte: read tcp 172.104.249.145:443->89.199.7.229:42496: read: connection timed out"
msg="Error while creating certificate store: unable to find certificate for domains \"advertools.app,advertools.app\": falling back to the internal generated certificate" tlsStoreName=default
msg="Error while creating certificate store: unable to find certificate for domains \"advertools.app\": falling back to the internal generated certificate" tlsStoreName=default
msg="Error while starting server: accept tcp 127.0.0.1:8099: use of closed network connection" entryPointName=auth_api
msg="Error while starting server: accept tcp [::]:443: use of closed network connection" entryPointName=https
msg="Error while starting server: accept tcp [::]:80: use of closed network connection" entryPointName=http
msg="The ACME resolver \"letsencrypt\" is skipped from the resolvers list because: unable to get ACME account: json: cannot unmarshal array into Go value of type acme.StoredData"
msg="accept tcp 127.0.0.1:8099: use of closed network connection" entryPointName=auth_api
msg="accept tcp [::]:443: use of closed network connection" entryPointName=https
msg="accept tcp [::]:80: use of closed network connection" entryPointName=http
msg="close tcp 127.0.0.1:8099: use of closed network connection" entryPointName=auth_api
msg="close tcp [::]:443: use of closed network connection" entryPointName=https
msg="close tcp [::]:80: use of closed network connection" entryPointName=http

Error messages that contain “letsencrypt”:

Jun 18 15:34:39 localhost traefik[202826]: time="2024-06-18T15:34:39Z" level=error msg="The ACME resolver \"letsencrypt\" is skipped from the resolvers list because: unable to get ACME account: json: cannot unmarshal array into Go value of type acme.StoredData"
Jun 18 15:38:53 localhost traefik[202941]: time="2024-06-18T15:38:53Z" level=error msg="The ACME resolver \"letsencrypt\" is skipped from the resolvers list because: unable to get ACME account: json: cannot unmarshal array into Go value of type acme.StoredData"
Jun 18 20:16:33 localhost traefik[204809]: time="2024-06-18T20:16:33Z" level=error msg="The ACME resolver \"letsencrypt\" is skipped from the resolvers list because: unable to get ACME account: json: cannot unmarshal array into Go value of type acme.StoredData"
Jun 18 20:17:48 localhost traefik[204845]: time="2024-06-18T20:17:48Z" level=error msg="The ACME resolver \"letsencrypt\" is skipped from the resolvers list because: unable to get ACME account: json: cannot unmarshal array into Go value of type acme.StoredData"

Any idea what might have gone wrong?

Thanks again!

I don’t know, maybe something is missing or corrupted? What do you have under /opt/tljh/state?

Thank you @manics

ls -l /opt/tljh/state

-rw------- 1 root root  13205 Jun 14 12:57 acme.json
-rw-r--r-- 1 root root 139264 Jun 26 14:55 jupyterhub.sqlite
-rw-r--r-- 1 root root 135168 Jun 18 15:34 jupyterhub.sqlite.2024-06-18-153440
-rw------- 1 root root     65 Jun 19  2023 jupyterhub_cookie_secret
-rw------- 1 root root  16384 Jun 19  2023 passwords.dbm
drwx------ 2 root root   4096 Jun 18 20:17 rules
-rw-r--r-- 1 root root     64 Jun 19  2023 traefik-api.secret
-rw------- 1 root root    885 Jun 18 20:17 traefik.toml

Tree:

/opt/tljh/state
├── acme.json
├── jupyterhub.sqlite
├── jupyterhub.sqlite.2024-06-18-153440
├── jupyterhub_cookie_secret
├── passwords.dbm
├── rules
│   ├── dynamic.toml
│   └── rules.toml
├── traefik-api.secret
└── traefik.toml

Anything that can help here?

acme.json stores the letsencrypt state

Can you try renaming it, and restarting traefik to see if it’s recreated?

Sorry for the delay, and thanks for your patience.

I ran through the steps:


sudo mv  /opt/tljh/state/acme.json /opt/tljh/state/acme.json.old

sudo systemctl restart traefik

sudo cat  /opt/tljh/state/acme.json
cat: /opt/tljh/state/acme.json: No such file or directory

Status is enabled, but fails to load:

sudo systemctl status  traefik
× traefik.service
     Loaded: loaded (/etc/systemd/system/traefik.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Tue 2024-07-02 14:23:37 UTC; 2min 44s ago
   Duration: 5ms
    Process: 462978 ExecStart=/opt/tljh/hub/bin/traefik -c /opt/tljh/state/traefik.toml (code=exited, status=226/NAMESPACE)
   Main PID: 462978 (code=exited, status=226/NAMESPACE)
        CPU: 4ms

Jul 02 14:23:37 localhost systemd[1]: traefik.service: Scheduled restart job, restart counter is at 5.
Jul 02 14:23:37 localhost systemd[1]: Stopped traefik.service.
Jul 02 14:23:37 localhost systemd[1]: traefik.service: Start request repeated too quickly.
Jul 02 14:23:37 localhost systemd[1]: traefik.service: Failed with result 'exit-code'.
Jul 02 14:23:37 localhost systemd[1]: Failed to start traefik.service.

Not sure if there’s a hint there.

What’s the safest way to delete the certificate and start over with a new one for the domain?

Thanks again. Appreciate your help.

Can you try turning on debug logging in the Traefik config file /opt/tljh/state/traefik.toml :

[log]
level = "DEBUG"

And show us your logs?

sudo journalctl -u traefik

Sure,

I just did that, made a few requests, and exported the last five days of the logs:

Thanks again!

I’m sorry, but I’m afraid I’m out of ideas… the certificate request is obviously failing but I don’t know why.

Jul 05 15:09:09 localhost traefik[509211]: time="2024-07-05T15:09:09Z" level=error msg="The ACME resolver \"letsencrypt\" is skipped from the resolvers list because: unable to get ACME account: json: cannot unmarshal array into Go value of type acme.StoredData"

...

Jul 05 15:09:09 localhost traefik[509211]: time="2024-07-05T15:09:09Z" level=error msg="Error while creating certificate store: unable to find certificate for domains \"advertools.app,advertools.app\": falling back to the internal generated certificate" tlsStoreName=default