When running JupyterHubs, storage often is a bigger driver of cost than compute. Compute often scales with how many users are currently running, while storage scales with how many users ever used it. So storage cost just grows and grows.
A simple way to make this cheaper is to not provide regular POSIX filesystem (a traditional disk drive) to users, and instead only provide object storage - object storage is much cheaper. However, a lot of code relies on access to a POSIX filesystem (git
for example), and most users don’t want to port their code to just use object storage.
A decent compromise is to move user home directories to object storage when the user hasn’t logged in for a while, and fetch it transparently back to a POSIX file system when the user logs in again. This lets us treat POSIX filesystem (exposed over NFS maybe) as a ‘hot cache’ almost, and operate with a smaller POSIX filesystem than otherwise. I think this is reasonably easy to build, and can be built to be fairly generic and safe.
We would need code to do two things.
The Archiving process
This should be a background job that runs in a loop, and finds user home directories that have been unmodified for a while and not currently in use. It archives these, puts them in object storage, and makes a note somewhere (a database?) that this user’s home directory is in object storage now. This is a background job because there is no reliable way to run something whenever a user’s pod stops.
The unarchiving process
When a user logs in, we check the database (or wherever the archiver is keeping state) to see if they have a home directory in need of extraction. This is done as an init container in z2jh, so the user pod is not started until this process is completed. A small script in the init container should somehow fetch the tarball from object storage, extract it to the POSIX filesystem, and then start the user pod. The user should know no difference, other than a delay for extraction to happen. Some authentication needs to happen to make sure that users can’t access others’ home directories.
Would love for someone to build this! <3