How do I properly protect my data access passwords (not jupyter tokens/passwords) on 3rd party jupyter hub services?

minrk · May 27, 2021, 7:29am

While the connection between notebook and kernel used for execution in a notebook is not encrypted, saving a file in the text editor does not use that connection - it’s a standard HTTP(S) request, followed by writing directly to the filesystem. If you trust the deployment to secure the connection down to the notebook server (either with JupyterHub’s own internal_ssl options or other cluster-level network security for inter-process communication), then saving a file should be safe.

With that in mind, I’d probably have an encrypted-at-rest credentials file, and use getpass() to enter the decryption key. Fernet is a convenient encryption tool because everything’s urlsafe base64, making it easy to work with in text environments. With that in mind, I would:

generate and save a fernet key key = cryptography.fernet.Fernet.generate_key()

build your credentials and store them in a file (local, not on the remote host):

import json

from cryptography.fernet import Fernet

encoder = Fernet(key)

encrypted_creds = encoder.encrypt(json.dumps(creds).encode("utf8"))

with open("mycreds.enc", "wb") as f:
    f.write(encrypted_creds)

send the encrypted file to the cluster via file upload / text editor

in your notebooks, get the key via getpass() and decrypt the creds:

import json
from getpass import getpass
from cryptography.fernet import Fernet


key = getpass("Credentials key: ")

decoder = Fernet(key)

with open("mycreds.enc", "rb") as f:
    creds = json.loads(decoder.decrypt(f.read()).decode("utf8"))

In this case, things aren’t perfect because:

credentials are stored at rest, but encrypted. This travels over the network, presumably https, but not the notebook->kernel tcp connection
the encryption key is sent unencrypted over the network, but it’s using an input request, which is less snoopable than the iopub channel, which a normal execution would be echoed on.

So one would need access to both the filesystem and sniffing the network to get your credentials. Still, these are long-lived, so if someone at any point is able to eventually get both of these things, they have your credentials.

You could also just send the credentials themselves with getpass(). This isn’t significantly different, but might be more tedious.

Depending on the deployment, the notebook server->kernel connection, even if unencrypted, should be relatively secure if it’s e.g. using localhost in a container (the most common default). I think you need pretty high privileges to snoop that from another container.

If it’s a shared host, the deployment can use the ipc transport mentioned here to avoid using the possibly more sniffable tcp between notebook->kernel. You can even do this yourself with local configuration files.

The only way I can see to do significantly better than this is for the input_request to add support for end-to-end encryption of the message. That’s tricky across languages, but not insurmountable. In that case, the unencrypted value and the decryption key could live only in memory. You could implement this yourself with public/private keys in memory, but doing so would probably be super tedious.

Topic		Replies	Views
Canonical way of confidentially storing user credentials for databases in JupyterHub JupyterHub	5	1161	February 8, 2022
Access the Jupyter Notebook by passing the credentials in URL JupyterHub	8	32298	July 6, 2020
Jupyter Notebook Token and Password Log in Notebook how-to , help-wanted	0	496	July 1, 2024
Best practices for ssh credentials on cloud-based jupyterhubs Zero to JupyterHub on Kubernetes	2	4530	April 15, 2019
Jupyter Docker with password text General how-to	1	4951	November 21, 2019

How do I properly protect my data access passwords (not jupyter tokens/passwords) on 3rd party jupyter hub services?

Related topics