Localizing data, quantifying stories, and showing your work at The Associated Press (an interview with Troy Thibodeaux)

by | Mar 3, 2017 | Data community

If we’re lucky, data journalism, as a distinct and rare focus within journalism, is not long for this world. This is the effect of democratization.

When technical skills begin to take hold in any industry, a familiar pattern plays out. Because of high barriers to entry — extensive training, costly proprietary software and hardware, woefully complex user interfaces, etc. — technical literacy is initially concentrated in a select few. Data journalism skills are still rare enough that it’s helpful and common to classify those who possess them as data journalists. But data literacy is spreading quickly within organizations, from beat to beat. J-schools are infusing curricula with R, Python, web scraping, and other low-cost, high-value tools of the trade. And vital, brilliant data journalism is taking place in local newsrooms around the world.

Reporters are interrogating the numbers found in press releases and picking apart the statistical errors in the statements of politicians. Numbers are shared with helpful context so that every reader and viewer can place them into perspective. New data points are compared to historical baselines. It’s increasingly common to see articles linking off to datasets behind the reporting.

Data journalism is becoming a standard part of journalism.

Tomorrow at NICAR, we’re talking about the joint pilot between The Associated Press and data.world to help newsrooms find local stories within large datasets, and to improve its ability to join, analyze, and distribute useful data to news organizations around the world.

I spoke to Troy Thibodeaux, data journalism team editor at The Associated Press, about the need for this collaboration and where it fits in the larger context of data and journalism. If you’re attending NICAR this week, come to our session and let’s discuss how you can get involved in the effort!

[IG] Let’s start with how AP has used data historically, and how that is evolving?

[TT] Traditionally, data has played a big role at AP primarily in terms of our election results tabulation. Our role counting the vote is a big part of our historical identity.

[IG] Right, that’s part of the AP identity that I’m familiar with from the outside.

[TT] Over the last decade or so, we’ve really expanded the ways we use data in our journalism.

We’ve used data analysis as a big part of our investigative and enterprise work. Our Washington investigative team and other investigative reporters around AP have worked on a number of high-profile big data projects, and we’ve built up our data journalism team to help us do data-driven work across all beats and regions.

And over the last few years, we’ve really worked to help make the data itself useful content for our members and customers

[IG] What’s a good example of one of those projects?

[TT] Recently, we began a collaboration with USA TODAY network looking at gun violence. The first story from that work was about children injured or killed in accidental shootings.

[IG] I saw that; it was really illuminating.

[TT] That story came from the data analysis — we saw a trend there, including spikes for two different age groups (toddlers and teens), and we decided to dig into the questions we raised.

The power of the piece came from the impact on the people, the families, but it was the data that led us there.

[IG] So the spikes alerted you to a buried story, in a sense. And does it work the other way around, too? You have a story you want to cover and you look for data to add depth to the piece?

[TT] Yes, in fact the latter is more typical.

[IG] Or to quantify something that is otherwise qualitative?

[TT] That’s the key piece of advice I give to reporters and editors: when you’re working on a story, ask yourself, “how can I quantify this?”

[IG] And has that advice worked its way into your editorial standards / expectations, or is just guidance for now?

[TT] We’ve had great response from folks across AP. We’ve collaborated on stories ranging from education to politics to health and science — and in almost every case the story has come from a reporter’s question, hunch or tip. It’s amazing to work with these journalists who have deep, deep knowledge of a subject area.

[IG] Totally — subject matter experts are vital to successful data projects, I think, because so much of the outcome depends on asking the right question, or being able to point to curious anomalies, or even sharing what things are not known in a particular domain.

So, where does localization of data play into this?

[TT] So, for the past few years, we’ve begun providing access to the data behind many of our stories before we publish them. We were looking at these rich data sets and realizing that the national story (or even 50 state stories) was only skimming the surface. We thought if we could put this data into the hands of our members and customers, they could find stories in the data we’d never see, particularly when they look for the angle most interesting for their audience.

I often say that most of the work in a data project is caught up in that “unglamorous 80%” — finding the data, vetting it, cleaning it, coming to understand its limitations.

We were doing that for all of these stories anyway, so why not let our members benefit from that head start?

[IG] Understanding what angles appeal to a local audience is an interesting type of subject matter expertise, in a way. I like that.

[TT] Definitely — and in fact, sometimes they see the anomaly or error in the data that we miss, precisely because they know the local scene so well.

[IG] It’s the concept of “ground truth,” which has been co-opted as a business buzz-phrase, but IIRC has its genesis in the difference between what can be known from aerial mapping / surveillance, and what can be known only by people on the ground.

Speaking of different perspectives, let’s talk about our collaboration a bit — I’m curious how you would summarize what we’re doing together, the broad strokes?

[TT] I look at the collaboration between AP and data.world as a step in the next evolution of AP’s data strategy. We’ve had amazing feedback and results from our localization efforts, but we haven’t had the right platform for sharing our data. The pilot project with data.world will give us an ecosystem for data collaboration. We want to bring our members into the platform and use its features to share the data more effectively and to create conversations around the data sets.

[IG] Let’s run through an example, if you don’t mind. Say we have a broadly-relevant dataset, like a new year of the American Community Survey. How does AP then make it locally-relevant?

[TT] I think there are broadly two types of datasets we use. The ACS would be an example of evergreen data: it can provide material for any number of stories.

[IG] Absolutely, it’s deeply connective and has so many facets.

[TT] Then there are the project-specific datasets: data that we get via FOIA or some other channel that helps us answer a specific question. Often, though, in order to put this story-specific data in context, we need the evergreen data (like ACS).

[IG] There’s a complementary effect.

[TT] There’s a lot of power in that kind of mashup. So, if we’re looking at healthcare, for example, and a topic like insurer choice, we can look at the demographics of places that have greater or fewer options for insurers.

It’s also useful for localization in that we can compare areas with similar characteristics. So, if you’re looking at transportation among metro areas, you can compare areas of similar population or with similar economic conditions to see if they use the same modes of transportation or if their approaches differ.

[IG] I wanted to ask you about fake news and “alternative facts.” What is data’s role as a source of truth these days, when we just don’t seem to have a shared sense of fact v.s. fiction as a society?

[TT] Transparency is important. It’s a standard we hold the government to, and it’s a standard we should hold the press to. The more journalists can show their work, whether it’s a copy of a crucial document or the data underlying an analysis, the more reason their audience has to accept their findings (or take issue with them in an informed way). When we share our data and methodology with our members, those journalists give us close scrutiny, which is good for everyone. And when we can release the data more broadly and invite our readers to check our work, we create a more secure grounding for the relationship with the reader. Sometimes it’s not possible to attain that level of transparency without putting someone at risk, but when it’s possible, we should shoot for it.

[IG] We’re really excited to be a part of that, Troy.

[TT] We’re excited as well! I’m already seeing really vital collaborations and discussions springing up around many of the data sets available on data.world. It seems like a great fit for the kind of conversations we’d like to have with other journalists and readers.

Want to hear how AP helps hundreds of journalists access stories that shape the world? Check out the case study here.


(Editor’s Note: This blog was updated on 9/18/2018 to include newer information about data.world and the Associated Press)