Possible topics for discussion:
TOPIC: How much security complexity can JupyterLite solve for in moving computation to a tab in the client’s browser? What about remote data?
Methods for remotely accessing/paging data in from a client when a complete download of the dataset is unnecessary:
- Query e.g. parquet on e.g. GitHub with DuckDB: https://github.com/duckdb/duckdb/blob/6c7c9805fdf1604039ebed47d233ea55cabb4b2c/test/sql/copy/parquet/test_parquet_remote.test#L28
- Query sqlite on e.g. GitHub with SQLite:
Hosting SQLite databases on Github Pages - (or IPFS or any static file hoster) - phiresky's blogThe above query should do 10-20 GET requests, fetching a total of 130 - 270KiB, depending on if you ran the above demos as well. Note that it only has to do 20 requests and not 270 (as would be expected when fetching 270 KiB with 1 KiB at a time). That’s because I implemented a pre-fetching system that tries to detect access patterns through three separate virtual read heads and exponentially increases the request size for sequential reads. This means that index scans or table scans reading more than a few KiB of data will only cause a number of requests that is logarithmic in the total byte length of the scan. You can see the effect of this by looking at the “Access pattern” column in the page read log above.
- GitHub - bittorrent/sqltorrent
- API File System Access: simplificação do acesso a arquivos locais | Capabilities | Chrome for Developers
The File System Access API: simplifying access to local files
The File System Access API allows web apps to read or save changes directly to files and folders on the user’s device
TOPIC: Launching remote notebooks within my org’s Jupyter resources
-
Should it be easy
- Should there be a warning about untrusted code
-
Jupyter-book has buttons to launch a remote instance with the current content and/or make code cells live:
- Launch into interactive computing interfaces
- Make your code cells executable
- Could this default to JupyterLite?
-
Would something like the ideas proposed in nbhandler (for launching remote repos locally with repo2docker instead of in a free cloud instance) be a security regression or enabling for science?
#1 · Issue #1 · westurner/nbhandler · GitHub-
Q: What repo2docker command should I run locally to do the same thing as mybinder.org?
repo2docker https://github.com/repo/example
-
FWIU, JupyterLite bundles in jupyter extensions with the static archive build. How is this best done with repo2docker? Will repo2docker always install the latest jupyterlab and dependencies (in a container layer) after the user installs whichever jupyter extensions are specified in e.g. a REES-compatible repo with a requirements.txt, environment.yml, and/or postInstall? Should there be a warning about things being out of date; like pip warns when pip is out of date?
If I deploy notebooks and their dependencies to WASM with JupyterLite like this, how will people then open this repo outside of a browser tab? With repo2docker locally? With a binderhub and/or a jupyterhub and/or locally (possibly with e.g. nbhandler)? With a Rocket Ship launch icon like jupyter-book? With a ‘launch in notebook platform _____’ badge? With a button on {github, gitlab, } that lets users select from various hosted notebook platforms? And then that then trusted code runs in a cloud instance or in a browser tab or locally as a local user with or without monitoring, logging, and [per-opcode] accounting.
pip install --pre jupyterlite
jupyter lite init
jupyter lite build
jupyter lite archive
An action for jupyter-lite just could build archives on GitHub’s resources using your GitHub Actions user/org quotas just like GitHub - jupyterhub/repo2docker-action: A GitHub action to build data science environment images with repo2docker and push them to registries. builds containers on resource-constrained cloud server vm container instances.
TOPIC: Realtime collaboration and Jupyter Security
- https://github.com/jupyterlab/rtc
- https://jupyterlite.readthedocs.io/en/latest/rtc/index.html#enabling-rtc-in-jupyterlite
- Is the entire replayable journal persisted in the .ipynb?
TOPIC: Jupyter, Capabilities, and free VMs and/or Containers
e.g. WASM (and thus Jupyter-Lite) does not include raw socket network access (but does support WebSockets and WebRTC). Hosted Jupyter solutions have various policies for free resource quotas and maybe network access. Which of these tasks are realistic needs for Jupyter containers?:
What does Falco check for?
Falco ships with a default set of rules that check the kernel for unusual behavior such as:
- Privilege escalation using privileged containers
- Namespace changes using tools like
setns- Read/Writes to well-known directories such as
/etc,/usr/bin,/usr/sbin, etc- Creating symlinks
- Ownership and Mode changes
- Unexpected network connections or socket mutations
- Spawned processes using
execve- Executing shell binaries such as
sh,bash,csh,zsh, etc- Executing SSH binaries such as
ssh,scp,sftp, etc- Mutating Linux
coreutilsexecutables- Mutating login binaries
- Mutating
shadowutilorpasswdexecutables such asshadowconfig,pwck,chpasswd,getpasswd,change,useradd,etc, and others.