Implementing Persistent Overlays for per-user python packages

Hi All,

This is my first post here and should have joined sooner. My team runs a couple of Jupyterhub instances as part of Digital Earth Australia and Africa projects to allow users to easily access Analysis Ready Data stored in S3 and indexed in Aurora PostgreSQL.

Australia Jupyterhub : https://app.sandbox.dea.ga.gov.au/

We run a custom user-pod image built using repository in Github : dea-sandbox

However as the number of packages in this image increase and they interact in unpredictable and fragile manner creating stable environments to meet all user needs is becoming more and more difficult. We would like to thin the number of packages in the base image to a stable subset ( GIS packages are notorious for ABI incompatibility issues) and encourage users to overlay their persistent set of packages using data stored in their own PVC’s (we use KubeSpawner for userpods on EKS). In case a user manages to break their python environment we would keep a safe mode base image to go back to and rebuild from.

I would like some help in getting the persistent overlay setup going. So far I have this article : https://itnext.io/using-overlay-mounts-with-kubernetes-960375c05959 , which does not quite meet our needs.

Regards,

Tisham.

1 Like