Season Two of Catalog and Cocktails started off in thrilling fashion with hosts, Juan Sequeda and Tim Gasper, spending an hour with the first ever U.S. Chief Data Scientist, DJ Patil.
The episode touched on a variety of topics including the U.S. pandemic response, using data to sow mistrust, and the ethical use of data. The following five questions are excerpted from the podcast. You can check out the entire recording here.
DJ Patil's favorite books found during the pandemic.
Juan Sequeda: Favorite podcast, book or show discovered during the pandemic?
DJ Patil: Well, I guess I got two here that I think are phenomenal reads, or strongly recommend. One is “Power to the Public.” It's by Tara McGuinness and Hana Schank, and I think they have the best overview of how data and technology and design all come together to actually make government function and what happens when it doesn't. And then a very specific one is Jer Thorpe's book, “Living in Data”, which I can't recommend strongly enough. It really takes you inside how data can be used for just a more human perspective.
What did we learn about data during the pandemic?
Juan: So DJ, honest, no BS question: Did we learn anything when the pandemic hit with respect to data?
DJ: Yeah, I think we've learned something that we've known for a long time is that we haven't been ready and we haven't taken it as seriously as we need to. And what do I mean specifically by that? Well, we've known from the first SARS outbreak, MERS and other diseases that we need to have very strong reporting of data; we need to have tracking ability to find things, contact tracing, all these things [that] President Obama highlighted.
There was a whole playbook that was built after Ebola, and Congress hasn't taken it seriously with funding it, and the CDC and others haven't been able to implement the right plans. Luckily, our vaccination investments have paid off. There's decades-long investments by the National Institutes for Health that have paid off. But what we've also seen is the incredible underfunding of local governments, local public health officers not having the tools at their disposal to understand things. Epidemiological modeling is so far behind relative to where we are in other types of forecasting, or just think weather forecasting or other types of economic forecasting.
And then we have a real big issue here on understanding this information, distrust, and how information is propagated, and we're not able to get to people fast enough either to understand public health issues such as something very simple about wearing a mask, or the importance of getting vaccinated, or even good hygiene, and taking the pandemic seriously.
How data is used to tell compelling stories.
Tim Gasper: That point that you make at the end there about data and its role in the pandemic, and how people are interpreting that information is especially acute for me. It makes me think about how each sort of side of the conversation is obviously using data, but using it to tell their own story.
DJ: I think we've seen it not just with the pandemic, but the combination of so many different things is additionally that the people's ability to be spun falsehoods on so many different fronts, and just arguing very basic things about transmissibility or other aspects. One of the things I think people miss is that there's a lot of gray in this, partly because it's a very fast evolving situation.
When I first started working on the pandemic issues and questions about what to do with regards to stay at home orders in California, we really only had data off two cruise ships. And some of the data coming out of Wuhan, we weren't sure how much to trust because we just didn't have a way of verifying it, and very, very limited information out of Italy.
We didn't have a good rigorous table that we could just parse and run a graph and create an understanding of. Quite the opposite. We were having to make it up as we went along. Would I have loved there to be a phenomenal infrastructure and everything ready to go? I wish, but some of these things about where are people going, where are people congregating, how do we use data in a responsible way? These are things that we need to solve on the front end, not during a crisis.
What is a citizen data scientist?
Juan: What's your definition for citizen scientist?
DJ: Yeah, so I think there's multiple definitions, but I think the easiest one that we can take is somebody who's this isn't their day job to do science. They have other jobs, but they have the skills or the aptitude that can massively push the frontiers of science forward.
What's on DJ Patil's data ethics checklist?
Juan: Let's talk about data ethics. What's on the ethical checklist? How do I know that my data product is ethical?
DJ: So I think there's a lot to learn from other fields about what does ethics look like, and I think from an ethics perspective and data, we're still early on the journey. We're far behind where we need to be. But we have to start implementing things right now, and so some of the concrete things that we can do is adhere to what we call the “5 C’s” as principles for that: consent, clarity, control, consistency and consequences. These things that we can implement as sort of our model for what's doing right with the data and how we ensure that we start to minimize harm.
Note from the author: Using frameworks like the "5 C's" is easier when you make them a part of your every day data governance processes and set up your data catalog to document your "5 C's" for each data project or asset.
That next one is we are going to need to start thinking about what it means to put institutional structures into our work and our workflows to ensure that we don’t make mistakes that cause tremendous harm? And some of those are those things of what would maybe an ombudsperson, or something else inside a company if you have a question or something, maybe you have to raise something to an outside organization without fear of repercussion. Maybe it is just that checklist when you’re developing a product. It’s probably going to be somewhat all of them. One of the ones I think I would love to see is a commitment by every university, every MOOC or online training course, anything, have ethics integrated as part of their curriculum. When you code, many times when you learn to code, at least the way I did is you’re doing database design or anything like that, no one checks if your structures are open to SQL injection attack. You get to a company, the first thing they’re doing is there are tests against that.
So what is our version to ensure that we’re reducing bias or asking some very basic questions? What we’re proposing here is saying, “Hey, at least with a checklist, it forces you to pause.” Here’s a very concrete one: So just before you launch a serious analysis or you launch your product, go get some pizza and your favorite set of drinks, and sit down for an hour with your team and ask what could go wrong? Have a field day. Have a field day on all the things that go wrong. Just write down the list and then stack rank them. Stack rank them by risk, low, medium, high. Impact, low, medium, high. Do that, and now you got your two by two.
Key takeaways
- Be an ethical citizen scientist by using the “5 C’s”- consent, clarity, consistency, control, and consequences
- Before launching a new data project, ask your team, “what could go wrong?”
- Work with amazing people who “kick your ass, make you happy, and make you better.”
Visit the Catalog and Cocktails page to listen to the full episode with DJ, any prior episode you might have missed and see upcoming guests and topics.