A landmark year for the data.world community

As we approach the new year it’s always a great time to look back and take stock of the road we have traveled over the last 12 months. This year in datasets, we continued to ride the political roller coaster with everything from gerrymandering to social media botnets, saw passionate debate around climate science, and continued to watch the explosion of data science, especially around topics like Artificial Intelligence and Machine Learning.

Here at data.world, in addition to our own continued growth as a company, we passed several interesting milestones like 150,000 open datasets in our community and over 50 integrations to other tools ranging from R and Python to Tableau and Microsoft Power BI. As we looked back we also wanted to highlight some of our team favorites when it came to notable or interesting open datasets. These are our top ten:

 

10. Video Games Global Sales in Volume 1983-2017

As we continue to watch the growth of platforms like Twitch and see the advent of more online games and digital sales, it is interesting to watch the decline of units of physical game sales. One community member imported data from vgcharts to look at the trend across the EU, NA, and JP sales numbers.

Visualization of sales over time from julienf on data.world

9. FIFA World Cup 2018

In the excitement leading up to the World Cup, FiveThirtyEight created a prediction tool and shared the associated data, which we then added to data.world. Did your favorite teams perform better or worse than predicted?

Final Bracket Visualization by Scuttlemonkey on data.world

8. Sports Viz Sundays 2018

A growing community on data.world, the SportsVizSunday crew are working hard to help people improve their visualization skills through the appreciation of all kinds of different sports stats. This includes everything from Formula 1 to NBA and Boxing. Drop by one of these weeks and try your hand at building a new and unique sports viz!

UEFA Champions League by jbaucke on data.world

7. Chicago Crime Dataset

While there wasn’t a ton of information around provenance or methodology, this Chicago Crime Dataset proved to be a very interesting, and robust, dataset to play with. Weighing in at almost 350,000 rows with tons of detail it could be a great resource for those who are wishing to stretch their data science chops a bit. Take a look and let the author know what you think in the comments!

6. Citylab Congressional Density Index

This is a really interesting dataset that includes not only the data, but some informative R scripts and visualizations. Citylab classified every US Congressional district based on the density of neighborhoods contained. They also included their methodology if you’d like to dig even deeper.

2010 CityLab CDI Map with GOP Pickups visualization by David H. Montgomery on data.world

5. INC 5000 2018

INC Magazine continues to publish tons of really interesting content. In this case they have published the data behind their top 5000 fastest growing companies. The file includes Name, State, revenue and many other salient details. Should be a great resource if you are looking for corporate data.

Inc. 5000 Full List

4. Social Media Bot Detection by Paragon Science

There has been a lot of discussion about how large groups of automated accounts (bots) on social media may have had an impact, or propagated disinformation, on current events. This includes everything from the 2016 US Presidential election to sentiment around the NFL. Dr. Steve Kramer has applied techniques from complexity theory, network graph analysis, and others to take a really detailed look at this phenomena. For more on his methodology you can read his piece on O’Reilly.

Paragon Science Twitter Bot Virality Results by Steve Kramer, PhD.

3. NFA 2018 National Footprint Accounts

The Global Footprint Network has published their findings for the 2017 National Footprint Accounts. This data “measure[s] the ecological resource use and resource capacity of nations from 1961 to 2014. The calculations in the National Footprint Accounts are primarily based on United Nations data sets, including those published by the Food and Agriculture Organization, United Nations Commodity Trade Statistics Database, and the UN Statistics Division, as well as the International Energy Agency. The 2018 edition of the NFA features some exciting updates from last year’s 2017 edition, including data for more countries and improved data sources and methodology.” For a more detailed explanation check out their explainer video.

NFA 2018 Edition by Global Footprint Network

2. Makeover Monday Big Mac Index

Makeover Monday continues to be a very active and popular community for both data.world users and the broader data visualization community. Each week Makeover Monday publishes a dataset and associated viz for people to rework or reenvision. The most popular of their weeks was week 31 this year, looking at the Big Mac Index. Stop by sometime soon to try your hand at a Makeover Monday viz!

Makeover Monday Big Mac Index 2018 Contribution by evansnary on data.world

1. Artificial Intelligence – Global Community Mapping

This summer, data.world was able to host a researcher from TechNation for a secondment to look at the global AI community.  Henri Egle Sorotos did a fantastic job looking at this community and sharing the associated data. Take a look at his summary blog or dive in to the full report!

Global AI Meetup Clusters by henritechcity on data.world

With the sheer number of open datasets and users being added every day, our team can’t wait to see what 2019 will bring. A huge thank you to the data.world community for all of your data, your research, and your openness. See you in 2019!

Want to be on next year's list?

Start contributing to our open data community here!