data.world is excited to announce the release of a valuable data source in partnership with the U.S. Census Bureau — the American Community Survey (ACS). With these 16 new Census datasets, users can enrich their analyses with the most up-to-date U.S. demographic and housing information available.
The ACS is the Census Bureau’s biggest, annual household survey as well as a super-connector dataset that’s leveraged throughout the U.S. by countless individuals, non-profit organizations, and businesses.
The new ACS population estimates provided on data.world are the first-ever ACS data distribution that allows data users to run SQL queries and join Census data to their own geo-coded datasets. Previously, this work had to be done offline after downloading the ACS source files separately, either via the American FactFinder website, FTP, or Census API.
In addition to SQL functionality, all estimates in the ACS Summary File datasets on data.world include multiple geographical codes to help users find the estimates and margins-of-error they are looking for. Aside from common geographic area names, the data also contain standard state postal abbreviations, Federal Information Processing Standard (FIPS) codes for states, counties, and places, Core-Based Statistical Area (CBSA) codes for Metro- and Micropolitan Statistical Areas (MSAs), ZCTA/ZIP codes, as well as full Census GEOID codes for every geographic entity in the data.
Sounds great, where can I find it?
Head over to the U.S. Census Bureau account on data.world. The ACS data currently stored on data.world span the following datasets:
More information about these datasets can be found in the README.md files stored in each dataset.
So how do I query the ACS data on data.world?
The best way to use the ACS datasets on data.world is through the query tool. Here are some sample queries to get you started:
Metadata Query 1
The first thing you’ll likely want to do in a given ACS dataset is display all the specific topics contained in the summary file. You can run this query from within any ACS dataset — click here to run it from the ACS2015_5_E_ForeignBirth dataset.
Metadata Query 2
You can display all columns and column descriptions contained in an ACS Summary File by running this query from within any ACS dataset. Click the following link to run it from the ACS2015_5_E_ForeignBirth dataset.
Select Using Geography — ZCTA/ZIP Code
This query selects a specific line item in the data using the ‘=’ operator in the WHERE clause. Notice the ZCTA/ZIP Code is stored as a string, requiring quotation marks around the numeric digits. This query returns the number of Native Citizens, Naturalized Citizens, and Noncitizens within data.world’s ZIP Code 78731. Click the following link to run this query from the ACS2015_5_E_ForeignBirth dataset.
Select Using Geography — Rename Column Headers
This is the same query as before but uses the ‘as’ keyword to rename columns to human-readable names. Click the following link to run this query from the ACS2015_5_E_ForeignBirth dataset.
Select Using Summary Level — All TX Counties
Instead of selecting a specific line item, this query selects all geographies of a given SummaryLevel within a given state. This query returns the number of Native Citizens, Naturalized Citizens, and Noncitizens for every county in TX. Click the following link to run this query from the ACS2015_5_E_ForeignBirth dataset.
Select Using Summary Level — All TX Counties w/ Order By
This is the same query as before but uses ORDER BY to alphabetize results by AreaName. Click the following link to run this query from the ACS2015_5_E_ForeignBirth dataset.
Select Using Geography — Area Name
You can also use regular expression string matching to find data by AreaName. This query selects all line items with “Houston” in the AreaName from the TX data file. Click the following link to run this query from the ACS2015_5_E_ForeignBirth dataset.
Multi-table Select Using Geography — Area Name
data.world’s SQL implementation also allows multi-table selects. This query selects all line items with “Riverside” in the AreaName from the TX, AL, GA, CA, and CT data files. Click the following link to run this query from the ACS2015_5_E_ForeignBirth dataset.
Federated Query — Local Dataset + 1 ACS Summary File dataset
Federated queries allow users to join multiple datasets together using common values in each table. This query selects ZCTA/ZIP Code and Median Age by joining Travis County ZIP Codes data to remote ACS2015_5_E_AgeSex Median Age data using user-provided ZCTA/ZIP codes stored in a local dataset. Click the following link to run this query from the databeats/Join To Census Data — Example dataset.
Federated Query — Local Dataset + 2 (or more) ACS Summary File datasets
You can add more ACS columns to the federated query by joining more ACS datasets. This query selects ZCTA/ZIP Code, Median Age, and Number of Males (aged 18–24) With Less Than 9th Grade Education by joining Travis County ZIP Codes to ACS2015_5_E_AgeSex and ACS2015_5_E_Education. Click the following link to run this query from the databeats/Join To Census Data — Example dataset.
In total, the U.S. Census Bureau conducts more than 130 programs and surveys per year, so this is just the beginning. We’re always updating at data.world, so check back often.
If you have any trouble using the ACS summary files on data.world, please email us at email@example.com.
If there are any other ACS summary files that you need for your own research and are not currently stored on data.world, please reach out to us at firstname.lastname@example.org.