How to copy specified folders for two users in the z2jh

Hello everyone, thank you for providing this channel for user inquiries. This is a fantastic project. I am relatively new to Kubernetes (k8s).

  1. I would like to know how to copy specified folders for two users in the z2jh (Zero to JupyterHub) project. I’ve come across the kubespawner project, but I’m not sure if I should create a job within it, similar to mounting and copying.
  2. If debugging is needed, do I have to build a hub image locally, install the corresponding modified pip packages into the image, and then update the image on the hub? Are there other ways to quickly validate changes directly within the internal pod?

If any of you have ideas, I would greatly appreciate some guidance. Thank you once again.

It depends, where are the two folders? Do they belong to different users, are they two folders belonging to the same user, on the same or different file storage? Here’s one option

but we’ll need more information if you want something else.

You can build and run the container image locally, you don’t need to deploy it to the hub for testing.

hi friend, thanks greatly for replying .
the two folders belong to two different users , it need copy from one pod to another pod.
the custom user env seem to do the copy job in the start container hook, but i need to do copy job during the container running , like a restful api , request for a copy job. but maybe i misunderstand custom user env’s work flow, please give more guidance , thank you very much
k8s job file is as followed

apiVersion: batch/v1
kind: Job
metadata:
  name: copy-files-2024-01-18-11
  namespace: jhub
spec:
  template:
    spec:
      volumes:
      - name: source-pvc
        persistentVolumeClaim:
          claimName: claim-test
      - name: destination-pvc
        persistentVolumeClaim:
          claimName: claim-spinq
      containers:
      - name: task
        image: busybox
        command: ["/bin/sh", "-c"]
        args: ["adduser -D -u 1000 jovyan && cp -R /src-dir/test /dst-dir/ && chown -R jovyan /dst-dir"]
        volumeMounts:
        - name: source-pvc
          mountPath: /src-dir
        - name: destination-pvc
          mountPath: /dst-dir
      restartPolicy: Never

Thank you once again.

This is outside the scope of JupyterHub since it involves copying files in the backend K8s storage. If you want a rest API you’ll need to write your own service that has access to all volumes and can copy files between them.

The alternative is to mount both volumes in the user’s pod and tell the user to run a script.

1 Like

thanks a lot for your answer!

In the end, I found a solution. It’s pretty simple and doesn’t consider isolation or anything like that, but hopefully, it can help anyone who needs it.

  1. I roughly referred to kubespawner and found that you can execute copy tasks and put them into the hub’s pod,
import asyncio
from kubernetes_asyncio import client, config
from kubernetes_asyncio.client import Configuration
import os
import time

async def create_job():
    # check env
    all_variables = os.environ
    print("all_variables",all_variables)
    config.load_incluster_config()
    global_conf = Configuration.get_default_copy()
    print("global_conf",vars(global_conf))
    k8s_batch_client = client.BatchV1Api()
    print("k8s batch client",k8s_batch_client)
    source = "spinq-5fsw-5f001sw001firstserver"
    destination = "mila-2duatmilafirstserver"
    file_path = "sharefolder"
    job = client.V1Job(
        api_version="batch/v1",
        kind="Job",
        metadata=client.V1ObjectMeta(
            # set job name
            name="copy-files-" + time.strftime("%Y%m%d%H%M%S") , 
            namespace="jhub"
        ),
        spec=client.V1JobSpec(
            template=client.V1PodTemplateSpec(
                spec=client.V1PodSpec(
                    volumes=[
                        client.V1Volume(
                            name="source-pvc",
                            persistent_volume_claim=client.V1PersistentVolumeClaimVolumeSource(
                                claim_name="claim-"+source
                            )
                        ),
                        client.V1Volume(
                            name="destination-pvc",
                            persistent_volume_claim=client.V1PersistentVolumeClaimVolumeSource(
                                claim_name="claim-"+destination
                            )
                        ),
                    ],
                    containers=[
                        client.V1Container(
                            name="task",
                            image="busybox",
                            command=["/bin/sh", "-c"],
                            args=[f"adduser -D -u 1000 jovyan && cp -R /src-dir/{file_path} /dst-dir/shared-workspace/ && chown -R jovyan /dst-dir"],
                            volume_mounts=[
                                client.V1VolumeMount(
                                    name="source-pvc",
                                    mount_path="/src-dir"
                                ),
                                client.V1VolumeMount(
                                    name="destination-pvc",
                                    mount_path="/dst-dir"
                                ),
                            ],
                        ),
                    ],
                    restart_policy="Never",
                ),
            ),
        ),
    )
    print("job",job)
    
    await k8s_batch_client.create_namespaced_job(namespace="jhub", body=job)
    await k8s_batch_client.api_client.close()

# run
loop = asyncio.get_event_loop()
loop.run_until_complete(create_job())
loop.close()
  1. However, there was a problem: the hub does not have permission to create jobs. So next, I added the job creation authority for the hub in chart yaml template (/template/hub/rbac.yaml),
{{- if .Values.rbac.create -}}
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: {{ include "jupyterhub.hub.fullname" . }}
  labels:
    {{- include "jupyterhub.labels" . | nindent 4 }}
rules:
  - apiGroups: [""]       # "" indicates the core API group
    resources: ["pods", "persistentvolumeclaims", "secrets", "services"]
    verbs: ["get", "watch", "list", "create", "delete"]
  - apiGroups: [""]       # "" indicates the core API group
    resources: ["events"]
    verbs: ["get", "watch", "list"]
  - apiGroups: ["batch"]
    resources: ["jobs"]
    verbs: ["create"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: {{ include "jupyterhub.hub.fullname" . }}
  labels:
    {{- include "jupyterhub.labels" . | nindent 4 }}
subjects:
  - kind: ServiceAccount
    name: {{ include "jupyterhub.hub-serviceaccount.fullname" . }}
    namespace: "{{ .Release.Namespace }}"
roleRef:
  kind: Role
  name: {{ include "jupyterhub.hub.fullname" . }}
  apiGroup: rbac.authorization.k8s.io
{{- end }}

  1. Finally, I added a file transfer interface for the hub.