Jupyterhub won't use keytabs

Hello,
I’ve been fighting this issue for months now, and my first few pages of Google all contain purple links.

I have an old HDFS/Spark cluster, and a new HDFS/Spark cluster, set up nearly identically, both with kerberos authentication enabled.
The old cluster is running on CentOS 07, and the new cluster is installed on RHEL 8.4. Jupyterhub is installed on an edge node on each cluster, and the config files for Jupyterhub are nearly identical, the only difference being that they point to slightly different folders because the path has different versions of software.

On the old cluster, using Python in Jupyter, we use os.system(kinit -kt user.keytab) to authenticate to HDFS, and from there, we can use os.system to run HDFS commands to view, move, or write files within HDFS.

On the new cluster, we can successfully invoke a keytab using os.system. I can run klist to verify my keytab is being seen, but when I try to interact with HDFS (and spark for that matter), I get an error that indicates my keytab is not good, with the error:

WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
ls: DestHost:destPort namenode.mylocation.local:8020 , LocalHost:localPort edgenode.mylocation.local/10.200.216.101:0. Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]

This would be with HDFS commands that work fine from the terminal for my keytab.

I can get the same results if I use ! in Juptyerhub.
HOWEVER: I can run the same code as straight python successfully from the terminal on the new cluster. Running the code in the terminal using jupyterhub to run the notebook has gotten mixed results. For some things it works, for other things, I have to put the juptyerhub command string in a bash script, and the bash script will work, but not invoking the notebook directly.

I have tried so many different approaches. I have made sure the full path to the keytab is readable by everything. I have made sure all the environmental variables are similar (again, different versions, slightly different locations for some things because of OS). I have made sure the krb5 packages are installed on the new system. All the relevant python packages are installed on the new system.

I could write about 3 more pages about what I’ve tried that also didn’t work. Any suggestions for a solution would be greatly appricated.
Thanks,
Eric

What Spawner and Authenticator are you using, and what version of JupyterHub?

This sounds like perhaps there is some environment set up in shell profiles that you are relying on. Often, the best way to get user environments to look the most like a regular login is to invoke a login shell as part of the startup.

This often looks like:

#!/bin/bash -l
# ^ `-l` means login shell - source profile files, etc.
exec jupyterhub-singleuser "$@"

Put that script somewhere, e.g. /usr/local/bin/jupyterhub-singleuser-login

and then tell JupyterHub to launch that instead:

c.Spawner.cmd = "/usr/local/bin/jupyterhub-singleuser-login"

But if this has to do with things like opening PAM sessions, it’s not as easily tractable. You can try c.PAMAuthenticator.open_sessions = True but this often doesn’t work in nontrivial cases.

1 Like

Thanks for the suggestions. Unfortunately these approaches didn’t work.

I did find a work-around solution, and through this, I think I found another important clue which may or may not be Jupyter related.

The solution was to use krbcontext in Python, and work with HDFS within that context.
In declaring the krbcontext, it wasn’t enough to define a kerberos principal and keytab, but I also had to define a kerberos cache file. On our old cluster, a kerberos cache file would be generated automatically in the /tmp folder when we invoked keytabs from Jupyter. If I klist from terminal, the cache files get generated on the new cluster, but it appears something is keeping Jupyterhub from generating these cache files, and thereby, not allowing code in Jupyter to work with kerberized resources.

I’m adding my code example here if anyone else is looking for a similar solution:

with krbcontext(using_keytab=True,
                    principal='ericbrow@MYLOCATION.LOCAL',
                    keytab_file='/home/ericbrow@mylocation.local/ericbrow.keytab',
                    ccache_file='/tmp/krb5cc_juptest'):
          pass