Best Practices in uploading large files to home directory (SFTP server?)

Some of my end-users have large files on their local machines that they would like to transfer to their home directories in JupyterHub on GKE. In some cases, they have been able to tar gzip them and upload them through HTTPS, but I would like to know if anyone has figured out a better way to transfer files to the home directories.

I was considering whether there would be a way to set up an SFTP server that would mount the appropriate home directory and then provide some temporary SFTP credentials that a user could use.

Has anyone worked out a way to do large file transfers to their Persistent Volume home directories?

I have a faculty member planning the same thing. Since we are autoscaling, something like direct SSH/WinSCP is out. I didn’t think of getting another container spun up for just file transfers. Something like this may work. I wish they supported Github OAuth for uniformity.

I’ve experimented with syncthing to do this on a kubernetes based JupyterHub.

It kinda works but syncthing seems to crash/tries to upgrade or some such after about 60s :frowning:

I have done it. But is can be tricky. The trick is to connect ssh to the bash shell of an image with the users persistent home mounted. That is the users ssh account as a special shell that set up the bash connection.

NOTE this can only be a local container, it can NOT be a running swarm service, unless you find that service is local, and you locate the actual container for the docker command.

If there is no running local container available to connect you need to start one… (see below)

Further that system needs to have a link from /usr/libexec/openssh/sftp-server which is where the host Redhat system thinks the sftp server is located, to where it is really located in the running docker container. IN my cas it was debian ‘apt’ installed which was at /usr/lib/sftp-server.

NOTE this system also work for rsync. But the special user shell, will likely need to recognise that it is a sftp or rsync connection.

type or paste code hereif [[ "$1" = '-c' ]]; then
  case "$2" in
      # ....
  /*/sftp-server|\
  'scp '*-[tf]' '*|\
  /*/'rsync --server '*|\
  'rsync --server '*)      # Known file transfer command...
    echo "Known file transfer command..." >> "$debuglog"
    if [[ "$container" ]]; then      # User is running an environment, and it is running on this
      # node of the swarm.  So lets connect the file transfer directly
      # to their service.
      echo "Direct connect to running environment..." >> "$debuglog"
      docker exec \
        --user     user \
        --workdir  '/home/user' \
        --env      'HOME=/home/user' \
        --interactive \
        "$container" \
        /bin/bash "$@"
    else
      # Users environment is not in an accessible container
      # Create a temporary "file-transfer" container for the transfers
      #
      # Uses the login PID to ensure the service name is unique so that
      # multiple file transfer containers can be used.  Mostly this is
      # because  "Filezillia" uses a separate connection, to do its
      # transfers in the background.   Arrgghhh...
      echo "Connect to a file-transfer container..." >> "$debuglog"
      mkdir -p "$mount"    # Ensure at least an empty directory exists
      docker run --rm \
        --name     "$user-ft-$$" \
        --workdir  '/home/user' \
        --env      'HOME=/home/user' \
        --volume   "$mount:/home/user" \
        --interactive \
        file-transfer-image:latest" \
        /bin/bash "$@"
    fi
    echo "Finished: $(date +'%F_%T') exit: $?" >> "$debuglog"
    exit $?
    ;;
   # ....
  esac

else  # Arguments given, but not the expected remote SSH command!
  log "ERROR: Non-standard shell argument!"
  echo "Non-standard shell argument!  This should not happen!" >> "$debuglog"
  exit 1
fi

Note there is a LOT missing from the above, like how to find the users running container location of there mounted home, etc etc etc. but the essence is there.