tornado.web.HTTPError: HTTP 500: Internal Server Error (Permission failure checking authorization, I may need a new token)

deej · January 30, 2023, 7:06pm

Hi all,
I’m hoping to get some input on a problem that has cropped up recently.

We have Jupyterhub running on Rocky Linux 8 using Batchspawner to run notebooks on a Slurm Cluster (also running Rocky Linux 8). This has been working great for many months.

Now whenever trying to spawn a notebook, the web interface gives:
“Spawn failed: sbatch: error: Batch job submission failed: Socket timed out on send/recv operation”

On the Cluster node where the job was trying to spawn, there are files like /tmp/jupyterhub-31717.error that contain:

I don’t have permission to check authorization with JupyterHub, my auth token may have expired: [403] Forbidden
{“status”: 403, “message”: “Forbidden”}
Traceback (most recent call last):
File “/mnt/local/python3.9/bin/batchspawner-singleuser”, line 8, in
sys.exit(main())
File “/mnt/local/python3.9/lib/python3.9/site-packages/batchspawner/singleuser.py”, line 17, in main
hub_auth._api_request(
File “/mnt/local/python3.9/lib/python3.9/site-packages/jupyterhub/services/auth.py”, line 436, in _api_request
raise HTTPError(
tornado.web.HTTPError: HTTP 500: Internal Server Error (Permission failure checking authorization, I may need a new token)

So far the only similar issue I have found searching the forums is a reference to a netrc or .netrc file in the users’s home directory causing the problem, but these files do not exist in the user dir, /etc, or in any other obvious places.

Anyone have any thoughts to share on what might be causing this, or how to troubleshoot?

Thanks,

-Dj

deej · January 30, 2023, 9:30pm

I believe I have traced the problem down to communication slowness issues between sssd on the Linux systems and the campus Active Directory service.

In the sssd.log I noticed lots of messages “sssd (‘default’:‘%BE_default’) was terminated by own WATCHDOG”.

As a workaround, I added “timeout = 45” into the domain section for the AD server in sssd.conf, and so far that seems to have helped. No more WATCHDOG termination messages and the jobs are now running.

I still haven’t figured out why the communication to the AD service is taking a lot longer than it used to, but at least this allows us to function while we troubleshoot.

-Dj

Topic		Replies	Views
[500 : Internal Server Error] Permission failure checking authorization, I may need a new token JupyterHub	4	1648	November 13, 2020
HTTP 500: Internal Server Error JupyterHub help-wanted	2	866	April 12, 2021
Jupyterhub spawn fails - tornado.web.HTTPError: HTTP 500: Internal Server Error General help-wanted	5	5035	October 16, 2024
Jupyterhub starts to throws Http 403 Errors on all REST API calls after a couple of days Zero to JupyterHub on Kubernetes	10	2838	August 23, 2021
Jupyterhub 500 : Internal Server Error JupyterHub jupyterhub , help-wanted	4	3478	June 8, 2021

tornado.web.HTTPError: HTTP 500: Internal Server Error (Permission failure checking authorization, I may need a new token)

Related topics