As a data.world user, imagine that you want to supplement a dataset with results from an online survey. Now, if you wanted to keep your dataset up-to-date with the latest responses, what would you do?

Could you upload files manually to data.world as many times as necessary for the duration of the survey? You could, but that’s not practical.

A much better alternative would be to upload a file via URL, pulling the content directly from the source. A file linked to a URL is a more practical option because it can be used to update a dataset with a one-click action or, even better, via API and automation.

Linked files are easy to set up

@haileypate has an excellent dataset about bicycle crash incidents in Austin. She is curious about where cyclists are at risk and what factors increase their likelihood of getting into an accident.

Let’s help @haileyplate and find out how people actually feel on the streets, with a survey. Google Forms stores survey responses in Google Sheets. Our spreadsheet can be found here.

Awesome hack #1:

A Google Sheets link can be used as a download link too. Simply replace “edit?usp=sharing” with “export?format=xlsx” in the URL.

With a good URL for our spreadsheet, let’s add it to the dataset.

Via API

Using cURL we can invoke the POST:/datasets/{owner}/{id}/files endpoint:

$ export DW_API_TOKEN=<your token goes here>
$ curl -H "Authorization: Bearer ${DW_API_TOKEN}" -H "Content-Type: application/json" -d @request.json https://api.data.world/v0/datasets/rflprr/austin-cycling-survey/files

Note that in the command above, @request.json is a reference to a file stored locally with the JSON content of the request we are making, as seen below:

{
  "files": [
    {
      "name": "survey-results.xlsx",
      "source": {
        "url": "https://docs.google.com/spreadsheets/d/1yUVBUEuf5C07CK0fUEdQDmwOBPwmgx_RPoXOFpf9kRg/export?format=xlsx"
      }
    }
  ]
}

And this is the final result:

Adding a file via URL using data.world’s API

Via web

On data.world, we just look for the option “Add file from URL”. The process is very intuitive and does not require further explanation.

Adding files via URL using data.world’s websiite

File updates can be automated

I don’t personally know @haileypate, but because she loves open data (also, rollerblading and queso), she will be so excited about this new dataset that she will want to visit it frequently and study the results of our survey. What can we do to guarantee that she will not be disappointed by a dataset that is never up to date?

Awesome hack #2

You can use a Google Apps script to invoke a URL every time a new response is submitted via Google Forms.

We will use a Google Apps script to invoke the POST:/datasets/{owner}/{id}/sync endpoint. This endpoint triggers the process that fetches the latest content for files added via URL. This is what that script looks like:

To complete the setup, we need to add this function as an installable trigger to our survey. You can learn more about how that is done here.

Now, every time someone submits a survey response, @haileypate will be able to see it right on data.world.

Voilá!

Endless possibilities

File addition via URL is a powerful and versatile tool. Here we demonstrated how it can be utilized to pull data from a live survey into a dataset, but you can pull virtually any data that can be retrieved via an HTTP URL.

Possibilities include:

  • Cloud storage services (e.g. Google Drive and Dropbox)
  • Source control platforms (e.g. GitHub and BitBucket)
  • Storage solutions (e.g. AWS S3)
  • Productivity tools (e.g. Google Docs and Google Sheets)
  • Open data portals (e.g. data.gov)
  • HTTP APIs (e.g. Twitter, Facebook, etc)

Learn more about this feature and companion APIs here.

Now it’s your turn. Get creative, make awesome “hot linked” datasets and tell us all about it.

Want to make your data projects easier/faster/better? Streamline your data teamwork with our Modern Data Project Checklist!