Is is possible to create a shared folder which is visible to existing users?

I am running JupyterHub through AWS EMR. I have used the API to create users and user groups, I know need to create some shared folders for the groups. I know this is possible with TLJH (Share data with your users — The Littlest JupyterHub v0.1 documentation) but I can’t seem to find the equivalent method for JupyterHub.

Could anyone direct me towards some information on how to create shared folders within a JupyterHub environment hosted on AWS EMR?

These are the docs provided by AWS:

You’ll need to work out how EMR is configuring storage with JupyterHub as it’s not clear from those docs. If no-one here knows the answer you could try contacting AWS support. If you get a reply from them please post the answer here- it might help someone else in the future!

Thanks for the prompt response.

From what I can see, jupyterhub is run through a docker container. The container is configured “to store notebooks and files in the file system on the master node”. There’s not more information than that so I’ve also contacted support to see if they can provide some more context.

1 Like

Here’s the response from AWS support, in short, they suggest that a shared folder can be achieved by setting up persistence, its then possible to use the persistent memory in s3 as a shared memory space.

They are still looking into a local shared space. Personally, I think would be preferable as this would allow me to leverage the user and group controls afforded by jupyterHub.

AWS support response:

I understand you running JupyterHub through AWS EMR and need create some shared folders for the groups. Please correct me If I’ve misunderstood.

I first check the link you provided, according their code, they achieve this by creating a directory that accessible to all group users.

++++++++++++
https://github.com/kafonek/tljh-shared-directory/blob/8f82b4338e60808c05cfc0798fe1929c3b9c6e6b/tljh_shared_directory.py
++++++++++++

The first thing we need ensure is if you storing the notebooks in s3 or from local directory in notebook.
User notebooks and files are saved to the file system on the master node. This is ephemeral storage that does not persist through cluster termination. So we can storage these files to s3 by configuring persistence for notebooks in your cluster with jupyter[1].

If you store files in s3, what we can do is to create shared folder in s3 under “jupyter” folder or any other s3 bucket and use this location to keep the shared files. For example, you can run the below command to copy a notebook from shared location to devuser1’s home directory:
++++++++++
aws s3 cp s3://mybucket/jupyter/shared/note.ipynb
s3://mybucket/jupyter/devuser1/
++++++++++

Please note that the IAM role attached to master instance group is used by the jupyterhub users for accessing these s3 objects. It is recommended to allow only read and write permissions to s3 buckets. Please refer [2] under references for more information.

If you save data from local directory and you want to create shared folder in locally, according the steps from TLJH, they create a shared folder to allow all the users to use. I am not familiar with this, so please allow some time for testing this approach.

References:
[1] Configuring persistence for notebooks in Amazon S3 https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-jupyterhub-s3.html

[2] Configure roles in EMR cluster https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-iam-roles.html

As a side question, any ideas regarding how to execute root commands in jupyterhub AWS EMR? I wanted to attempt to create the shared folder locally as I did with TLJH but it looks like the admin doesn’t have the permissions to do so and I’m unsure how to access the root account.

That’s also an EMR issue, since the user in question is inside the container. If EMR doesn’t let you execute commands as root (whether inside the container, or using docker exec ... outside), then I think you’re stuck.