How do I properly protect my data access passwords (not jupyter tokens/passwords) on 3rd party jupyter hub services?

I have a security question: how do I protect my data access passwords in jupyterhub? Here is my scenario:

  1. I’m using a third-party jupyterhub service. Maybe this is through my University. Maybe this is through a 3rd party company. Who knows. The jupyterhub is running on someone else’s servers, but I can assume they’ve done reasonable due diligence with respect to security in their setup. I cannot be sure that other people using the service (non-admins) are trustworthy.
  2. I need to use a username and password in my notebook to access data from somewhere.

There are a couple of ways I could approach this:

  1. I could put my username and password in the notebook itself as variables. This is not great for a couple of reasons I can think of:
    1.1. It would be ridiculously easy to accidentally save my notebook somewhere with the username and password.
    1.2 The connection between the notebook and the kernel is not encrypted (https://jupyterhub.readthedocs.io/en/stable/reference/websecurity.html), so I’d run the risk of other users sniffing my traffic and seeing my password.

  2. I could create an unencrypted .env file or .netrc file on the server running my notebook and put the information in there. This means I’m less likely to accidentally save the notebook with my password, but it’s still not great:
    2.1 I’d be saving my password in plain text on a 3rd party server.
    2.2 My password would once again be send unencrypted to the kernel because the only way I can think of creating the file is by either using the terminal program to edit the file or the text editor in jupyter.

So what should I do?

1 Like

While the connection between notebook and kernel used for execution in a notebook is not encrypted, saving a file in the text editor does not use that connection - it’s a standard HTTP(S) request, followed by writing directly to the filesystem. If you trust the deployment to secure the connection down to the notebook server (either with JupyterHub’s own internal_ssl options or other cluster-level network security for inter-process communication), then saving a file should be safe.

With that in mind, I’d probably have an encrypted-at-rest credentials file, and use getpass() to enter the decryption key. Fernet is a convenient encryption tool because everything’s urlsafe base64, making it easy to work with in text environments. With that in mind, I would:

  1. generate and save a fernet key key = cryptography.fernet.Fernet.generate_key()

  2. build your credentials and store them in a file (local, not on the remote host):

    import json
    
    from cryptography.fernet import Fernet
    
    encoder = Fernet(key)
    
    encrypted_creds = encoder.encrypt(json.dumps(creds).encode("utf8"))
    
    with open("mycreds.enc", "wb") as f:
        f.write(encrypted_creds)
    
  3. send the encrypted file to the cluster via file upload / text editor

  4. in your notebooks, get the key via getpass() and decrypt the creds:

    import json
    from getpass import getpass
    from cryptography.fernet import Fernet
    
    
    key = getpass("Credentials key: ")
    
    decoder = Fernet(key)
    
    with open("mycreds.enc", "rb") as f:
        creds = json.loads(decoder.decrypt(f.read()).decode("utf8"))
    
    

In this case, things aren’t perfect because:

  • credentials are stored at rest, but encrypted. This travels over the network, presumably https, but not the notebook->kernel tcp connection
  • the encryption key is sent unencrypted over the network, but it’s using an input request, which is less snoopable than the iopub channel, which a normal execution would be echoed on.

So one would need access to both the filesystem and sniffing the network to get your credentials. Still, these are long-lived, so if someone at any point is able to eventually get both of these things, they have your credentials.

You could also just send the credentials themselves with getpass(). This isn’t significantly different, but might be more tedious.

Depending on the deployment, the notebook server->kernel connection, even if unencrypted, should be relatively secure if it’s e.g. using localhost in a container (the most common default). I think you need pretty high privileges to snoop that from another container.

If it’s a shared host, the deployment can use the ipc transport mentioned here to avoid using the possibly more sniffable tcp between notebook->kernel. You can even do this yourself with local configuration files.

The only way I can see to do significantly better than this is for the input_request to add support for end-to-end encryption of the message. That’s tricky across languages, but not insurmountable. In that case, the unencrypted value and the decryption key could live only in memory. You could implement this yourself with public/private keys in memory, but doing so would probably be super tedious.

3 Likes

Because I was curious, I implemented an example of end-to-end encrypted communication with a Kernel, assuming you trust nothing other than access to the Kernel’s memory (which includes sending execution requests to the kernel): https://gist.github.com/776ccc423e2c6131ec6f088ef247c9c3

It is pretty tedious, but you can probably make it less terrible with some utility functions aimed at the copy/paste on two computers execution model.

2 Likes

Thank you! This is awesome. Exactly the information I was looking for.