Question:
I’m working with a JupyterLab extension that currently uses AWS S3 for file storage via the AWS SDK. For security reasons, I want to replace the direct S3 access with API calls to my backend server, which will handle the S3 operations server-side.
Here’s my current approach:
In index.ts, I’ve modified the authFileBrowser plugin to use placeholder credentials:
const authFileBrowser: JupyterFrontEndPlugin<IS3Auth> = {
id: 'jupydrive-s3:auth-file-browser',
description: 'The default file browser auth/credentials provider',
provides: IS3Auth,
activate: (): IS3Auth => {
return {
factory: async () => {
console.log('Setting up S3/R2 proxy via server API...');
// Since we're using API endpoints for all S3 operations,
// we just need a minimal configuration with the bucket name
const config = {
bucket: 'my-bucket', // This is just for display purposes
root: '',
config: {
forcePathStyle: true,
// These are placeholder values since actual S3 operations
// will be handled by the server
endpoint: 'https://api-proxy',
region: 'auto',
credentials: {
accessKeyId: 'proxy-auth',
secretAccessKey: 'proxy-auth'
}
}
};
console.log('S3/R2 proxy setup complete');
return config;
}
};
}
};
And in s3contents.ts, I’m replacing the S3 operations with API calls:
async get(
path: string,
options?: Contents.IFetchOptions
): Promise<Contents.IModel> {
path = path.replace(this._name + '/', '');
// format root the first time contents are retrieved
if (!this._isRootFormatted) {
this._root = await this.formatRoot(this._root ?? '');
this._isRootFormatted = true;
}
try {
// Use API endpoint instead of direct S3 access
const response = await fetch(`/api/s3/contents?path=${encodeURIComponent(path)}&root=${encodeURIComponent(this._root)}`, {
method: 'GET'
});
if (!response.ok) {
throw new Error(`Failed to fetch contents: ${response.statusText}`);
}
const data = await response.json();
Contents.validateContentsModel(data);
return data;
} catch (error) {
console.error('Error fetching contents:', error);
throw error;
}
}
// Similar changes for other methods like save, delete, rename, etc.
My questions are:
-
Is this the correct approach to replace S3 SDK operations with API calls?
-
What specific API endpoints do I need to implement on my backend to fully support JupyterLab’s contents manager functionality?
-
Are there any special considerations for handling binary files, large files, or streaming content?
-
How should I handle authentication and authorization for these API calls?
-
Are there any examples or reference implementations of a custom API-based contents manager for JupyterLab?
I’m trying to maintain all the functionality of the S3 contents manager but with improved security by keeping S3 credentials on the server side.