The FrictionlessData specification for data

I’m here at CSVConf and just heard about an interesting project that’s sponsored by the Open Knowledge Foundation. It’s called FrictionlessData and it seems to be an attempt at defining specifications for data of varying kinds, with the goal of making it easier to share, discover, ingest, etc.

It seems like it might be an interesting avenue to pursue if we wanted to extend the repo2docker spec to include data that doesn’t live within the repo itself.

I’ll try to keep looking into it while I’m here, but flagging it here in case anybody else has worked with it before.

Docs here: https://frictionlessdata.io/docs/
Specs page here: https://frictionlessdata.io/specs/
For example, tabular data spec here: https://frictionlessdata.io/specs/tabular-data-resource/

I had a look around https://github.com/frictionlessdata/datapackage-py (it wasn’t quite clear which of the packages I should look at) as a tool that could fetch data. However I couldn’t find an example that showed this off :frowning:

Your idea is that if there is a (say) data.json file in the repository then repo2docker would automate the step of fetching the datasets referenced in that file so that they’d be baked into the image?

Something like that, yeah. I feel like we’ve discussed the general idea of how repo2docker could handle data in a structured way, but we’ve never had any specific ideas come out of it. Just wanted to share this one in case it’s worth investigating further. I only just heard about it a couple days ago, just wanted to share :slight_smile: