How do I properly protect my data access passwords (not jupyter tokens/passwords) on 3rd party jupyter hub services?

While the connection between notebook and kernel used for execution in a notebook is not encrypted, saving a file in the text editor does not use that connection - it’s a standard HTTP(S) request, followed by writing directly to the filesystem. If you trust the deployment to secure the connection down to the notebook server (either with JupyterHub’s own internal_ssl options or other cluster-level network security for inter-process communication), then saving a file should be safe.

With that in mind, I’d probably have an encrypted-at-rest credentials file, and use getpass() to enter the decryption key. Fernet is a convenient encryption tool because everything’s urlsafe base64, making it easy to work with in text environments. With that in mind, I would:

  1. generate and save a fernet key key = cryptography.fernet.Fernet.generate_key()

  2. build your credentials and store them in a file (local, not on the remote host):

    import json
    
    from cryptography.fernet import Fernet
    
    encoder = Fernet(key)
    
    encrypted_creds = encoder.encrypt(json.dumps(creds).encode("utf8"))
    
    with open("mycreds.enc", "wb") as f:
        f.write(encrypted_creds)
    
  3. send the encrypted file to the cluster via file upload / text editor

  4. in your notebooks, get the key via getpass() and decrypt the creds:

    import json
    from getpass import getpass
    from cryptography.fernet import Fernet
    
    
    key = getpass("Credentials key: ")
    
    decoder = Fernet(key)
    
    with open("mycreds.enc", "rb") as f:
        creds = json.loads(decoder.decrypt(f.read()).decode("utf8"))
    
    

In this case, things aren’t perfect because:

  • credentials are stored at rest, but encrypted. This travels over the network, presumably https, but not the notebook->kernel tcp connection
  • the encryption key is sent unencrypted over the network, but it’s using an input request, which is less snoopable than the iopub channel, which a normal execution would be echoed on.

So one would need access to both the filesystem and sniffing the network to get your credentials. Still, these are long-lived, so if someone at any point is able to eventually get both of these things, they have your credentials.

You could also just send the credentials themselves with getpass(). This isn’t significantly different, but might be more tedious.

Depending on the deployment, the notebook server->kernel connection, even if unencrypted, should be relatively secure if it’s e.g. using localhost in a container (the most common default). I think you need pretty high privileges to snoop that from another container.

If it’s a shared host, the deployment can use the ipc transport mentioned here to avoid using the possibly more sniffable tcp between notebook->kernel. You can even do this yourself with local configuration files.

The only way I can see to do significantly better than this is for the input_request to add support for end-to-end encryption of the message. That’s tricky across languages, but not insurmountable. In that case, the unencrypted value and the decryption key could live only in memory. You could implement this yourself with public/private keys in memory, but doing so would probably be super tedious.

4 Likes