Introducing the Data Inspector

by | Aug 9, 2017 | Data catalogs

Be honest: how long do you spend combing for missing values, scrubbing erroneous outliers, and coercing columns into a uniform format just to get to the analysis you want to do?

There’s no question data quality and consistency are crucial, but manually verifying this is tedious, inefficient, and often ineffective; while issues like missing or outlier values can be detected with routine diagnostics, other more subtle issues like mistyped category names are difficult to weed out. Worse, these issues compound with scale, slowing or outright preventing analyses on larger and richer datasets.

At data.world, it’s our mission to remove these unnecessary barriers to data analysis and get you discovering meaningful insights faster. That’s why we’re excited to introduce a new feature: the Data Inspector.

Starting today you’ll be able to use the Data Inspector on any dataset you own or collaborate on to rapidly diagnose issues with your data. Simply upload a file to a dataset or project and we’ll automatically identify and display potential issues.

Data Inspector from the ‘workspace’ view
Data Inspector from the ‘dataset’ view

From here you can easily download the file, run some quick fixes locally in your tool of choice, and upload a new, high-quality dataset, all without leaving the inspector.

The inspector detects a number of issues covering a variety of use cases. This includes more straightforward issues such as blank cells, duplicate rows, and numeric values or string lengths far outside the standard deviation for their field.

Further, the inspector detects more subtle inconsistencies in your dataset like text that appears to have typos. For columns that appear to only contain a specific category like states or countries, the inspector will also alert you when an observation is not within this set.

Additionally, the inspector detects potential security issues like credit card, social security, and phone numbers.

Currently, the Data Inspector detects 21 types of issues in 6 categories (structural, numeric, text, noise, security, geospatial). For a full breakdown of the issues check out our documentation.

The Data Inspector has been something many of us have been working on for a while and we’re excited to finally release this feature to the community.

We hope this feature will help data.world continue to become not just your most abundant data source, but also your highest quality data source.

What’s more, we believe the Data Inspector will make data.world an even more powerful collaboration tool. Thanks to the ease at which users can identify and fix any issues with datasets as well as the consistent standards enforced by the inspector, it is easier than ever to jump in, find reliable data, and start joining datasets. We hope this feature will help data.world continue to become not just your most abundant data source, but also your highest quality data source.

As always, please let us know what you think. If you have any suggestions for the team, drop us an email at help@data.world or join our community slack channel. We can’t wait to get your feedback on this exciting new tool!

Want to make your data projects easier/faster/better? Streamline your data teamwork with our Modern Data Project Checklist!