How to convert JupyterLab S3 Contents Manager to use a custom API instead?

Question:

I’m working with a JupyterLab extension that currently uses AWS S3 for file storage via the AWS SDK. For security reasons, I want to replace the direct S3 access with API calls to my backend server, which will handle the S3 operations server-side.

Here’s my current approach:

In index.ts, I’ve modified the authFileBrowser plugin to use placeholder credentials:

const authFileBrowser: JupyterFrontEndPlugin<IS3Auth> = {
  id: 'jupydrive-s3:auth-file-browser',
  description: 'The default file browser auth/credentials provider',
  provides: IS3Auth,
  activate: (): IS3Auth => {
    return {
      factory: async () => {
        console.log('Setting up S3/R2 proxy via server API...');
        
        // Since we're using API endpoints for all S3 operations,
        // we just need a minimal configuration with the bucket name
        const config = {
          bucket: 'my-bucket', // This is just for display purposes
          root: '',
          config: {
            forcePathStyle: true,
            // These are placeholder values since actual S3 operations
            // will be handled by the server
            endpoint: 'https://api-proxy',
            region: 'auto',
            credentials: {
              accessKeyId: 'proxy-auth',
              secretAccessKey: 'proxy-auth'
            }
          }
        };
        
        console.log('S3/R2 proxy setup complete');
        return config;
      }
    };
  }
};

And in s3contents.ts, I’m replacing the S3 operations with API calls:

async get(
  path: string,
  options?: Contents.IFetchOptions
): Promise<Contents.IModel> {
  path = path.replace(this._name + '/', '');

  // format root the first time contents are retrieved
  if (!this._isRootFormatted) {
    this._root = await this.formatRoot(this._root ?? '');
    this._isRootFormatted = true;
  }

  try {
    // Use API endpoint instead of direct S3 access
    const response = await fetch(`/api/s3/contents?path=${encodeURIComponent(path)}&root=${encodeURIComponent(this._root)}`, {
      method: 'GET'
    });

    if (!response.ok) {
      throw new Error(`Failed to fetch contents: ${response.statusText}`);
    }

    const data = await response.json();
    Contents.validateContentsModel(data);
    return data;
  } catch (error) {
    console.error('Error fetching contents:', error);
    throw error;
  }
}

// Similar changes for other methods like save, delete, rename, etc.

My questions are:

  1. Is this the correct approach to replace S3 SDK operations with API calls?

  2. What specific API endpoints do I need to implement on my backend to fully support JupyterLab’s contents manager functionality?

  3. Are there any special considerations for handling binary files, large files, or streaming content?

  4. How should I handle authentication and authorization for these API calls?

  5. Are there any examples or reference implementations of a custom API-based contents manager for JupyterLab?

I’m trying to maintain all the functionality of the S3 contents manager but with improved security by keeping S3 credentials on the server side.

1 Like

Will be great to answer to this. @fomightez @jtp thoughts on this? This is based on jupydrive-s3/src/index.ts at 7cb48f727d7a47ac5cca01df021a1c6b84465cf7 · QuantStack/jupydrive-s3 · GitHub. help is most appreciated

In this case, you may want to have a look at GitHub - QuantStack/jupyter-drives: Jupyter Server supporting JupyterLab IDrive, which can be configured on the backend.

1 Like