The response to the introduction of SQL at data.world has been really amazing. In a short time, SQL has become one of the primary ways that users interact with data, whether through the query UI, or through our API integrations (Python, R, Java).
In response, we continue to invest in SQL as a core technology in our data catalog. While we remain focused on supporting the SQL standard, we’re also extending the standard to support the unique needs of our users. Additionally, we continue to improve our query editor to make writing queries easier.
A very common pattern that we’ve observed is a single conceptual table being split into different files or sheets (often based on time). Here’s a simple example which splits purchase data by month into separate sheets of a Google Spreadsheet.
The same basic data has been split across a number of sheets with each containing the same columns, but different data. We’ve implemented a custom extension that allows you to treat these tables as one.
As an added bonus, you can future-safe your queries with a regular expression.
Aggregations, filtering, ordering and other standard SQL features will continue to work with this multi-table syntax.
Oftentimes users want to join the data in their local dataset against another dataset in data.world. You can do this using what’s called a federated query.
Each of the following datasets contain a column containing a two-letter state abbreviation. I would like to join these datasets together so that I can do some further analysis on the relationship between them.
The fully qualified path to each table can be specified with a ‘dotted’ notation. Tables may then be joined using standard SQL
Union, Intersect, Minus
In addition to custom SQL extensions, we also continue to support the SQL standard.
MINUS have been added recently and are useful when working with multiple tables. This is similar to the multi-table syntax, but more flexible.
Dataset Table Information
There are a couple of special schema tables for each dataset:
This is useful information, though somewhat hidden. We have therefore included the schema in a new pane that’s displayed alongside the query editor. Just click to copy fully qualified column and table names.
This article is just scratching the surface of what’s possible today with SQL at data.world (filtered aggregations… anyone?). For more information, please check out the dwSQL docs.
We’re constantly adding more functionality to our cloud data catalog so please follow the Advanced SQL dataset and we’ll be sure to update when we have new things to show!