Ever wondered why your dataset or project didn’t get recognized? Eager for collaborators but can’t seem to find the right team? You may have missed the mark on data quality.

Quality data is more approachable and determines whether people can understand and use your data. If you’ve focused on quality from the beginning, people will find it easier to engage with your data and ultimately be more likely to either work with it or incorporate it into their existing work. With great data, the data.world community can create a flywheel effect of data enthusiasm and collaboration.

When it comes to the data.world community, we define data quality as completeness, freshness, reputation, uniqueness, and consumability. While data quality has many definitions and each depends on context and your individual requirements, these five traits are foundational to creating useful data and analysis.

So how can you make sure your data meets the mark? This guide explains how you can use data.world to make sure your data meets each benchmark.

 

1. Completeness

Is your dataset or project complete? You can make your data and analysis easier to find and more useful to others by documenting it. All it takes is creating metadata. It’s easier than it sounds!

Metadata provides important contextual details. Most datasets aren’t self-describing, so you need to help others understand how to use it to its full potential. These contextual details are especially valuable to other users who happen to stumble upon your work.

We designed this checklist to help you hit all the right metadata notes:

Lastly, consider whether your dataset or project will be open or private to the data.world community. As a Public Benefit Corporation, data.world thrives on collaborative data sharing, and part of our mission is the proliferation of open data. We encourage collaborative data contributions whether that's across teams in your business or as part of the world's largest collaborative data community.

 

2. Freshness

How fresh is your dataset or project? Data becomes stale when it goes out of date and no longer reflects reality (typically, “reality” is the original source of your data). You might be asking, “So what’s the big deal?” Stale data can invalidate your data work and worse, including:

To keep things fresh, make sure you update your data on a regular basis. This is easy on data.world. Sync your data with automatic sync options so you don’t have to worry about it again. Simply add files from URLs and define sync options, whether that is hourly, daily, or weekly. You can feel good about always having fresh data, spending less time on intense manual work.

 

3. Reputation

Can people trust your dataset or projects? Like in life, your reputation is everything on data.world. Most people know that evaluating sources is an important part of the research process. If there is no clear indication whether your data or analysis is trustworthy, your work will likely go ignored, regarded as unreliable.

You can establish trustworthiness by answering these questions about your dataset or project:

While they’re not all mandatory, these show your data is trustworthy. The more you have, the easier it is for others to trust your data and use it, knowing it will give them the most complete, accurate, and relevant information they need to make a decision, tell a story, or understand the business.

 

4. Uniqueness

Is your dataset or project unique? There is a lot of data on data.world, so search first to see if the same dataset has already been uploaded. There’s nothing wrong with uploading your own copy, but improving someone else's established work through collaboration or direct linking will keep that data’s “narrative” in one place and can benefit the entire community.

If you find out another user has beat you to uploading similar data, don’t worry! There is still an opportunity to contribute. Here are two solutions:

 

5. Consumability

Is your dataset or project consumable? Will others be able to use your data? Here’s how to make sure:

Queryability

Ensure your data is queryable. Sometimes its formatting doesn’t translate the way you think it will, making querying impossible. Watch out for these potentially corrupt or incomplete parts of your data and other common gotchas.

Matchability

A knowledge graph powers the data.world data catalog. As a result, we can match, enhance, and understand your data.

Read our blog to learn more: what is a data catalog?

As we process your data, we look for matches with any known data types within the data.world system, which helps find relationships across data files and align all data on our platform with industry standards.

Make sure you are getting the most out of our data matching functionality. Click on those little green triangles next to column names for our matches!

File Size & Format

Is your file too big? Try compressing large files to a size that makes it easier for others to quickly work with your data. And remember: Sampling can be useful for datasets that are too large to efficiently analyze in full.

There are no restrictions on file types that can be uploaded or downloaded on data.world. However, certain formats like PDFs are not queryable formats, so you’ll only be able to store and view them.

One important thing to remember about file size and format is that they’re not always good indicators of quality. Bigger isn't always best for what you need, and sometimes the juiciest information is packaged up in a different format than you'd expect.

Ready to impress our open data community with your data quality skills? Jump in here!