Top 10 SQL Features

by | Sep 6, 2017 | Data catalogs

A few months back, we told you about some advanced SQL features at data.world. I’m happy to say that there’s more to show. We’ve been hard at work improving every aspect of our implementation of SQL and query editor. Here’s a collection of our top 10 favorite new features.

10. WITH Clause

SQL queries are notorious for quickly becoming complex. Even a seemingly simple question can balloon into an incomprehensible query. The WITH clause can help break complex queries into logical views, making even the most complicated query simple to read and maintain.

This example first selects AllStar players from 1933, then looks for players from Chicago.

’33 Chicago Cubs

Additionally, queries in the WITH clause may cascade (or “build up”) to the final result.

’33 Chicago Cubs (named ruth)

9. Subqueries

While the WITH clause can help break up complex queries, sometimes it makes sense to just run a query directly within another. Subqueries allow you to use queries within the WHERE clause to do additional filtering.

The following example looks for players from 1933 in the American League who hit more home runs than all players from the National League. Notice that we have a subquery which selects the # of home runs for each player from the National League.

American League HR

8. Project Implicit Federation

Data projects are a great new feature of data.world. Data projects allow you to group disparate datasets and work on them together as a logical unit within the workspace. To that end, SQL now implicitly federates across datasets linked to a project. This means that these datasets effectively live in a single queryable namespace. There is no need to fully qualify paths to tables across different datasets — just link and join.

This query depends on two datasets, but the tables are accessible directly from the project. In the past, these table names would need to be fully qualified.

Median Age for Travis County

7. ROLLUP

We’ve also worked to enable more advanced aggregation features. ROLLUP is a great way for data analysts to present not only aggregated data, but also totals across aggregations.

The following query pulls Home Runs by year and league.

Home Runs 2014/2015

Note the rows that roll up (or accumulate) totals across leagues and then all years.

6. Filtered Aggregations

Have you ever wanted to aggregate some data, but tease out more than one calculation? Filtered aggregations allow you to do just that.

Here’s an example that pulls the average salary per team; including 2014/15 for comparison.

Average Salaries since 2000

5. CASE Statements

CASE statements allow for conditional logic in SQL statements (similar to if statements in other languages). They allow for alternate values depending on conditional expressions.

This example renames “AL” and “NL” abbreviations to “American” and “National” using the CASE statement.

Top Home Runs 2015

4. Statistical Correlation

An often requested aggregation function was “Pearson’s correlation coefficient” which calculates a 0–1 numeric correlation across two numeric values. This is useful when trying to understand how numberic variables might be related.

Let’s find the correlation between RBI’s and HR’s for each team in 2014 and 2015. (Surprise, there’s a high correlation!)

RBI to HR Correlation by Team

3. Date Functions (diff, part, format, etc)

Date calculations are an essential component of many types of analysis. User have repeatedly requested the ability to split dates into parts, calculate the difference between dates and format dates.

We’ve added support for the following date functions: DATE_ADD, DATE_SUB, DATE_PART, DATE_TRUNC, DATE_FORMAT, and DATE_DIFF.

2. Keyboard Shortcuts and Reformatting

Anyone who spends time working in the query editor will quickly tire of reaching for the mouse to run queries. To help save those extra few seconds whenever possible, we’ve enabled a keyboard shortcut to run queries.

Cmd-Enter on Mac (Ctrl-Enter on Windows)

Additionally, while writing SQL can become second nature to many, reading SQL can sometimes be quite difficult. Therefore, we’ve implemented an opinionated SQL formatter based one of the more readable approaches to SQL formatting.

Cmd-Opt-Lon Mac (Ctrl-Alt-L on Windows)

Let’s take one of our queries from above…

… and format it!

Keywords uppercased, indented for readability.

We’re so certain that you’ll want to use this in every query that we’ve included a “reformat and run” shortcut to simplify things even more.

Cmd-Shift-Enter on Mac (Ctrl-Shift-Enter on Windows)

1. API Access

Finally, all of this functionality is available through our growing list of integrations and via our API. SQL can be used directly in Python and R, Tableau reports, JDBC-compatible tools, Algorithmia apps and more.

data.world/integrations

For more information, check out our integrations page. There are new integrations coming online each week, so stay tuned for new ways to integrate data.world with your workflow.


Think you know advanced business SQL? Think again.

Be smarter with your business intelligence. Join us on Thursday 1/25 at 1pm CST for an interactive webinar, where Shad Reynolds and Dave Griffith from the data.world Product Engineering team will demo the easiest way to do advanced business SQL.

Want to make your data projects easier/faster/better? Streamline your data teamwork with our Modern Data Project Checklist!