What’s the best way to have a scheduled job run on a user pod for Z2JH?
The example use-case would be syncing user directories based on user groups.
This is currently accompanied by a kubernetes cronjob, which uses the hub REST API and an external API to sync internal user groups with Jupyter groups. This runs successfully, and runs on schedule as expected.
Additionally, a shared directory is mounted to all user pods under /mnt/Shared/* with a unique directory name that matches Jupyter group names. A sync.sh script fires on podstart to iterate through the user groups, and
ln /mnt/Shared/$i /home/[username]/$i. This runs successfully, and syncs user directories on first login as expected.
Ideally, I’d like to run this same script on a schedule.
I’ve tried to add the below to the z2jh config:
- */10 * * * * root /usr/local/etc/jupyter/sync.sh >> /var/log/sync.log 2>&1
Just adding this didn’t seem to work as expected. After I
kubectl exec -it into a user pod, I found that the cron package wasn’t included in the image. I added, but still not getting any results/job runs.
singleuser.extraContainers may be a possible solution, but I’m not so sure. Because this uses it’s own image, it would have a unique filesystem which I don’t believe can manage the userpod filesystem.
Is there a better way to run these jobs inside of a user pod on a schedule? Or is there a secret to getting cron to work normally in an image?
Containers don’t usually contain a full operating system, so system services like Cron aren’t setup.
It’s possible to achieve a “VM-like” container, but you’ll need to build it yourself to include JupyterLab/notebook:
I think extraContainers should work if you’re using volumes, since volumes can be shared across all containers in the pod.
This should work - as I’m already using a custom image (not z2jh images).
This suggestion is either “vm-like” OR extra container, not AND (a “vm-like” extra container) correct?
Is there any other services or routes that are Jupyter/hub native for running schedules on user servers?
Yes. In the first case you’re running multiple services inside a single container, mimicking a VM. In the latter you’re running seperate processes/services in separate containers (but in the same K8s pod), with a shared volume.
Not in JupyterHub. However JupyterHub and Jupyter server/lab/notebook are highly extensible, so for example you could write a Jupyter server extension that runs scheduled jobs. This is for schewduling notebooks rather than jobs but it illustrates what’s possible:
For any future visitors, I’ve found what I think may be an easier way (if you already create your own singleuser image) - closer to the ‘“VM-like” container’ solution. This uses supercronic.
- Add the “stanzas” from the release build, as described in their install instructions.
- Add the execute to the singleuser.lifecycleHooks.postStart
** I add the
& to send the pid to the background, and not hang the loading of the jupyterserver container
command: ["/bin/bash", "-c", "/usr/local/bin/supercronic", "/etc/crontab", "&"]