A few months back, we told you about some advanced SQL features at data.world. I’m happy to say that there’s more to show. We’ve been hard at work improving every aspect of our implementation of SQL and query editor. Here’s a collection of our top 10 favorite new features.
10. WITH Clause
SQL queries are notorious for quickly becoming complex. Even a seemingly simple question can balloon into an incomprehensible query. The WITH
clause can help break complex queries into logical views, making even the most complicated query simple to read and maintain.
Additionally, queries in the WITH
clause may cascade (or “build up”) to the final result.
9. Subqueries
While the WITH
clause can help break up complex queries, sometimes it makes sense to just run a query directly within another. Subqueries allow you to use queries within the WHERE
clause to do additional filtering.
8. Project Implicit Federation
Data projects are a great new feature of data.world. Data projects allow you to group disparate datasets and work on them together as a logical unit within the workspace. To that end, SQL now implicitly federates across datasets linked to a project. This means that these datasets effectively live in a single queryable namespace. There is no need to fully qualify paths to tables across different datasets — just link and join.
This query depends on two datasets, but the tables are accessible directly from the project. In the past, these table names would need to be fully qualified.
7. ROLLUP
We’ve also worked to enable more advanced aggregation features. ROLLUP
is a great way for data analysts to present not only aggregated data, but also totals across aggregations.
6. Filtered Aggregations
Have you ever wanted to aggregate some data, but tease out more than one calculation? Filtered aggregations allow you to do just that.
5. CASE Statements
CASE
statements allow for conditional logic in SQL statements (similar to if statements in other languages). They allow for alternate values depending on conditional expressions.
4. Statistical Correlation
An often requested aggregation function was “Pearson’s correlation coefficient” which calculates a 0–1 numeric correlation across two numeric values. This is useful when trying to understand how numberic variables might be related.
3. Date Functions (diff, part, format, etc)
Date calculations are an essential component of many types of analysis. User have repeatedly requested the ability to split dates into parts, calculate the difference between dates and format dates.
We’ve added support for the following date functions: DATE_ADD
, DATE_SUB
, DATE_PART
, DATE_TRUNC
, DATE_FORMAT
, and DATE_DIFF
.
2. Keyboard Shortcuts and Reformatting
Anyone who spends time working in the query editor will quickly tire of reaching for the mouse to run queries. To help save those extra few seconds whenever possible, we’ve enabled a keyboard shortcut to run queries.
Cmd-Enter
on Mac (Ctrl-Enter
on Windows)
Additionally, while writing SQL can become second nature to many, reading SQL can sometimes be quite difficult. Therefore, we’ve implemented an opinionated SQL formatter based one of the more readable approaches to SQL formatting.
Cmd-Opt-L
on Mac (Ctrl-Alt-L
on Windows)
We’re so certain that you’ll want to use this in every query that we’ve included a “reformat and run” shortcut to simplify things even more.
Cmd-Shift-Enter
on Mac (Ctrl-Shift-Enter
on Windows)
1. API Access
Finally, all of this functionality is available through our growing list of integrations and via our API. SQL can be used directly in Python and R, Tableau reports, JDBC-compatible tools, Algorithmia apps and more.
For more information, check out our integrations page. There are new integrations coming online each week, so stay tuned for new ways to integrate data.world with your workflow.
Think you know advanced business SQL? Think again.
Be smarter with your business intelligence. Join us on Thursday 1/25 at 1pm CST for an interactive webinar, where Shad Reynolds and Dave Griffith from the data.world Product Engineering team will demo the easiest way to do advanced business SQL.