The FrictionlessData specification for data

choldgraf · May 8, 2019, 3:50pm

I’m here at CSVConf and just heard about an interesting project that’s sponsored by the Open Knowledge Foundation. It’s called FrictionlessData and it seems to be an attempt at defining specifications for data of varying kinds, with the goal of making it easier to share, discover, ingest, etc.

It seems like it might be an interesting avenue to pursue if we wanted to extend the repo2docker spec to include data that doesn’t live within the repo itself.

I’ll try to keep looking into it while I’m here, but flagging it here in case anybody else has worked with it before.

Docs here: https://frictionlessdata.io/docs/
Specs page here: https://frictionlessdata.io/specs/
For example, tabular data spec here: https://frictionlessdata.io/specs/tabular-data-resource/

betatim · May 9, 2019, 10:12am

I had a look around https://github.com/frictionlessdata/datapackage-py (it wasn’t quite clear which of the packages I should look at) as a tool that could fetch data. However I couldn’t find an example that showed this off

Your idea is that if there is a (say) data.json file in the repository then repo2docker would automate the step of fetching the datasets referenced in that file so that they’d be baked into the image?

choldgraf · May 9, 2019, 2:37pm

Something like that, yeah. I feel like we’ve discussed the general idea of how repo2docker could handle data in a structured way, but we’ve never had any specific ideas come out of it. Just wanted to share this one in case it’s worth investigating further. I only just heard about it a couple days ago, just wanted to share

Topic		Replies	Views
Creating a specification for reproducible repositories discuss	20	2195	July 7, 2019
Dataverse Community Meeting short talk? Binder	9	729	June 10, 2020
Generated Dockerfile v repo2docker as archival format Binder	3	758	January 9, 2019
Repo2DockerSpawner - alternative version JupyterHub	23	2606	August 3, 2020
Repo2docker/mybinder.org as part of data/code publishing guidelines discuss	5	1168	February 4, 2019

The FrictionlessData specification for data

Related topics