About this episode

We roll into season two with a straightforward question: “Did we learn anything new about data work and data people during the pandemic?” 

Who better to address that subject than DJ Patil, mathematician, entrepreneur, and the very first U.S. Chief Data Scientist? Join Tim, Juan, and DJ for a wide-ranging conversation on data cultures, architectures, roles, and enduring lessons from the past 16-months of largely remote work.

Special Guests:

DJ Patil

DJ Patil

Former U.S. Chief Data Scientist

This episode features
  • Hot new job titles and emerging opportunities
  • How to establish a new data culture built around hybrid work environments
  • Favorite podcast, book, or show discovered during the Pandemic
Key takeaways
  • Being ethical in data; use the “5 C’s”
  • Before launching a new data project, ask your team, “what could go wrong?”
  • Work with amazing people who “kick your ass, make you happy, and make you better.”

Episode Transcript

Tim Gasper:
Your honest, no BS, non-salesy conversation about enterprise data management with tasty beverages in hand, its Catalog and Cocktails. I am Tim Gasper, a longtime data nerd and product guy at data.world joined by Juan.

Juan Sequeda:
Hey, Tim. I’m Juan Sequeda. I’m the principal scientist here at data.world, and as always, it’s a pleasure to be back and to take that break in the middle of the week, end of the day, and just enjoy a drink and talk about data, and I am so excited about just our new season, and even more excited because our first guest for this season is the one and only DJ Patil. DJ, how are you doing?

DJ Patil:
I’m well, thanks. Thanks for having me.

Juan Sequeda:
Well, DJ is… I mean, he’s a mathematician, entrepreneur and the US’s first chief data scientist, and he has done so much work throughout the pandemic on data, and I mean, we all talk about data science and that’s all thanks to DJ pushing all this stuff for last 10 years. DJ, we’re so excited to be here. So hey, let’s kick this off. What are we drinking and what are we toasting here for? DJ, you go first.

DJ Patil:
So sadly, I’m a little boring, but extraordinarily hydrated today. My trusty thermos of some water because I still got a long, long day ahead of me, so there won’t be anything for a little while. I guess what am I toasting to today? It’s something we’re going to talk about: the citizen scientist, the rise of the citizen scientist is really who has done just incredible work here during the pandemic and has just shown the power of data.

Juan Sequeda:
How about you, Tim? What are you drinking today and what are we toasting for?

Tim Gasper:
Well, first of all, very, very excited with this conversation that we have today. DJ, so glad you could be here with us. And for me, today, I am kicking things off with actually a Scotch Manhattan, also known as a Rob Roy, and you can see it here for those that are watching along, and actually the scotch that I’m using is this one here that I picked up a couple of weeks ago. It’s pronounced Bruichladdich, which I did not think that was how it was pronounced until I looked it up on Google. So it always can surprise you if you don’t know those things. Very delicious. It has a smoky note, even though it’s unpeated. And what am I toasting to? I’m going to toast to vaccines because if you don’t have one yet, please go get one. That’s my recommendation.

Juan Sequeda:
Cheers to that. And I’m having… I always like to figure out what’s in my fridge or my bar, and today I made this weirdest thing, which actually is really cool. It is, I’m going to call it, an IKEA Blue Gin and Tonic, and it’s called an IKEA because a while ago, I went to IKEA and you can actually get a blueberry drink concentrate, which I’m like, “What am I going to go do with it?”, and then I had-

Tim Gasper:
Some of that blur blar.

Juan Sequeda:
Exactly, and I have some Dripping Springs Texas gin and I made that and it’s actually really good, and I’m going to go cheer to citizen scientist and to vaccine. So cheers, everybody.

DJ Patil:
Cheers, DJ. You guys expense this stuff?

Tim Gasper:
We should, huh? We should.

Juan Sequeda:
We should. Okay, we’ll take that note. Right. So quick warm up here. So favorite podcast, book or show discovered during the pandemic? Who wants to go first?

DJ Patil:
Well, I guess I got two here that I think are phenomenal reads, or strongly recommend. One is Power to the Public. It’s by Tara McGuinness and Hana Schank, and they I think have the best overview of how data and technology, design all come together to actually make government function and what happens when it doesn’t. If you’d like Michael Lewis’ Fifth Risk, this is the how to guide to actually make those things work. And then a very specific one is Jer Thorpe’s book, Living in Data, which I can’t recommend strongly enough. It really takes you inside how data can be used for just a more human perspective.

Tim Gasper:
Awesome.

Juan Sequeda:
Well, I got those now on my list. How about you, Tim?

Tim Gasper:
I haven’t read that one, DJ. I’ll have to pick that one up, both of them. So I’m actually going to do a doubleheader as well. I recommend this book here. It’s got an improper word on it, but as an exclamation mark. If you’re a history buff, very interesting. S-H-I-T Went Down by James Fell, and then my second thing I want to mention is actually, if any of you haven’t checked it out yet, Bo Burnham’s Netflix comedy, special air quotes there, because it’s not really a comedy I would say, Inside. It’s funny, it’s musical and it’s full of all the modern day crises and darknesses, and it’s a weird ironic masterpiece of the times.

Juan Sequeda:
Wow. Well, I guess you’re all doing two, I have to do two. Well, one is for a book. I’m actually flip around, I’m going to show off here, I finally finished my book. Literally just sent it now. No more revisions. So the book I discovered was my own book that we finally got it down, so Designing and Building Enterprise Knowledge Graphs. So that’s coming out any minute now. And on the show side, I actually discovered, I’ve been binge watching The Bob Newhart Show. It’s just great to see old comedy. All right, so enough kind of the chitchat here. Let’s get this party started. So DJ, honest no BS question: Did we learn anything when the pandemic hit with respect to data, anything new about data work and data people during the pandemic? What have we learned?

DJ Patil:
Yeah, I think we’ve learned something that we’ve known for a long time is that we haven’t been ready and we haven’t taken it as seriously as we need to. And what do I mean specifically by that? Well, we’ve known from the first SARS outbreak, MERS and other diseases, we’ve known that we need to have very, very strong reporting of data, we need to have tracking ability to find things, contact tracing, all these things President Obama highlighted. There was a whole playbook that was built after Ebola, and Congress hasn’t taken it seriously with funding it, CDC, others haven’t been able to implement the right plans. Luckily, our vaccination investments have paid off. There’s a decade long of investments by the National Institutes for Health. That’s paid off.

DJ Patil:
But what we’ve also seen is the incredible underfunding of local governments, local public health officers not having the tools at their disposal to understand things, epidemiological modeling is so far behind relative to where we are in other types of forecasting, or just think weather forecasting or other types of economic forecasting. We’re just not there in terms of the caliber of those things. And then what we seen is we have a real big issue here on understanding this information, distrust, and how information is propagated, and we’re not able to get to people fast enough either to understand public health issues such as something very simple about wearing a mask, or the importance of getting vaccinated, or even good hygiene, and taking the pandemic seriously.

Tim Gasper:
Especially that point that you make at the end there about data and its role in the pandemic, and how people are interpreting that information is especially acute for me, and it makes me think about how each sort of side of the conversation is obviously using data, but using it to tell their own story. And how much of that has been a big headline of sort of the pandemic and really put that in focus?

DJ Patil:
I think we’ve seen it not just with the pandemic, but the combination of so many different things is additionally that… The election and people’s just inability to or the people’s ability to be spun falsehoods on so many different fronts, and just arguing very basic things about transmissibility or other type of aspects. And one of the things I think people miss in these things is that there’s a lot of gray in this, partly because it’s a very fast evolving situation. When I first started working on the pandemic issues, and we were particularly in the questions about what do we do with regards to stay at home orders in California, other things, we really only had data off two cruise ships, some of the data coming out of Wuhan, which we weren’t sure how much to trust, we just didn’t have a way of verifying it, and very, very limited information out of Italy.

DJ Patil:
So what did we do? We were up at all sorts of hours talking to our friends in those environments when things were really going badly in New York, just talking to people in the ER all the time. We didn’t have a good rigorous table that we could just parse and run a graph and create an understanding of. Quite the opposite. We were having to make it up as we went along. Would I have loved there to be a phenomenal infrastructure and everything ready to go? I wish, but some of these things even there about where are people going, where are people congregating, how do we use data in a responsible way? These are things that we need to solve on the front end, not during a crisis.

Juan Sequeda:
So you started early on cheering for the citizen scientist here. Well, let’s start with the definition of a citizen scientist because I’m seeing this word thrown out a lot. What does it actually mean, or what’s your definition for citizen scientist?

DJ Patil:
Yeah, so I think there’s multiple definitions, but I think the easiest one that we can take is somebody who’s this isn’t their day job to do science. They have other jobs, but they have the skills or the aptitude that can massively push the frontiers of science forward. So what does that look like? Well, you have groups like Covid Act Now that start to get established, you have the COVID Tracking Project. We have a phenomenal group of people that get together and start saying, “Hey, how can we get an accurate assessment of what’s going on?” You have people like myself, who jump as volunteers into state and local governments. You have the U.S. Digital Response, that is an organization of all volunteers, 4000, 5000 volunteers that all show up just to help build infrastructure to support issues around COVID. You have these groups that just do remarkable things, and we’ve seen continued other examples where people are looking through genetic snippets of code samples of COVID to try to understand what’s going on.

DJ Patil:
Who are these people? Well, maybe some of them have a background in maths, some of them have a background in statistics, may have taken online courses. Who knows? Maybe some bunch of them have PhDs. I mean, myself, during my doctorate work, I spent a lot of time on understanding epidemiology. When I was first in public service, I was out in Central Asia trying to look at bio weapons and prevention of the use of bio weapons and proliferation of bio weapons. So I’d already been exposed, in the sense, to a lot of these ideas, and so jumping in to take our skills that we have to help against this problem was natural, and there’s so many people out there that that’s the case.

DJ Patil:
And by the way, this is a pattern that we’ve seen repeating increasingly over time. We saw this during the hurricanes in Haiti, we’ve seen during earthquakes, and we almost have this new form of first responder, which is a digital first responder. They can take satellite imagery and start understanding where things are washed out, where things change, and they can deploy it. And just as a concrete example is think about a hurricane, a devastating hurricane like Katrina, and I was in public service when that happened, and trying to understand very basic things: where are people. Think about what we could do right now with the technology we have. We could fly drones around and look for people on houses, we can have high-res thermal imaging off of those drums, we can figure out how do you route people to different places or where you going to have the Zodiacs kind of zip between houses.

DJ Patil:
There’s no difference in the scheduling algorithm that UPS and the mail use versus that. It’s a kind of a traveling sales route. These packages that come together radically shift in the way we actually approach a disaster. We’ve seen the next incarnation of that now, which is people are using data very fast, very efficiently to tell public health officials and other politicals how to think about and get their head around such a black swan event that we’re seeing right now, that’s COVID.

Juan Sequeda:
So you said, I think, the honest no BS definition for citizen scientist. This isn’t their day job, but they have the skills needed to push the data analysis forward. So I think that’s a critical thing is that they have the skills. Now, one of the… Kind of going to a previous topic is one of the concerns I personally have and I think people around will say is, “Well, people are going off and doing analysis and putting charts out and they’re tweeting things out and they get a bunch of likes and stuff like that, but that’s not their day job, but do they actually have the skills to go do that?” How do we balance this of getting the skills and being a… I don’t know, what do we call it? A good citizen data scientist?

DJ Patil:
How do you get a high quality analysis, right? That’s a very fair thing, and one of the things that we’ve learned over time, even for people who we leverage their skill sets during a disaster like an earthquake, or hurricane or tornado, some people want to just deploy code and then you kind of… Who’s got code review and all these other pieces? And so the same way, you need this to be fundamentally a team sport, and analysis by a single person isn’t going to get the job done, and you need this to be a collaborative team effort. When I’m doing something personally, I want three to four people looking over my shoulder to make sure I’m not making a mistake, and so this is that classic adage of the more eyes on a problem, the more shallow the bugs are. It’s the same kind of thing is here, and so this is why it’s so important to fast publish, fast produce. We saw different forums, whether it was Discord, Slack channels, you name it, WhatsApp groups of people coming together from around the world to compare results.

DJ Patil:
Something very simple as what is the R nought? The R effective. The number of people that might be impacted if a one person is in a room whose infectious. Debates around that number is like, “Hey, is it one? Is it 1.5? Is it .7? Is it 3?” And those debates help make everyone smarter, and when we can do that, that’s why it’s so critical. And this is honestly one of the reasons why products that you all create at data.world are so important is because one of the things that I find very often is when you’re working in one of these very intense situations, everyone just says, “Hey, I got some data,” and then they try to put it someplace. And then it’s a great way for where data goes to die. Nobody has it. So literally one of the first things that I put into these situations is a data catalog, the data dictionary, because otherwise, you’ve got a lot of great stuff, but no one can follow after you, and so many people are just rebuilding and rebuilding, it prevents the collaborative nature of things to take place.

Tim Gasper:
Yeah. How do you expose and document things and collaborate around things in a more open way, so that way you can work together and you can extend upon the work that’s been done, right? Whether that’s in the public realm, or in the private or enterprise realm, right?

DJ Patil:
Yeah, people mistake this. During a disaster, no one can stay on a disaster for such an extended period of time, especially if it’s not your day job. I mean, the people who do this, they burn out. It’s really, really hard work. It’s just like frontline staff who are in the hospitals and putting it day in, day out. They need a break too. They need emotional breaks. You need to sleep. There’s some very basic things you got to… Self-care that you have to take care of. And one of the things that we see with these teams is the need to rotate, and you can’t carry on good citizen science or science in general if you don’t have a good rotational model, and that means putting in a little bit infrastructure, which we all know how to do in code, but we somehow neglect when we’re doing it with analysis.

Juan Sequeda:
It’s the analogy like if we have first responders, whatever, you can’t have a doctor working 24 hours all the time, right? They have to go switch around, so we should do the same with the scientists.

DJ Patil:
Let me make it very concrete, super concrete. So let’s suppose we’re going to go try to figure out how many COVID cases or infections or hospitalizations are per area, and we’re not getting good answers. So we’re going to phone call, we’re going to call, and now we’ve done it for one county, we figured it out, and now we got to do for another county. At some point, we got to get more people in there to distribute this thing because otherwise, we can’t get that collected data. Maybe we need a way of calling twice to fact check, to give ourselves a checksum kind of thing.

DJ Patil:
All those kinds of things are needed means that your job, fundamentally, in all these kinds of situations is to fire yourself out of a job. Give it to others to be able to carry the water, give it back to the pros who this isn’t their job and they’re so burned on water, let’s give them something that they can use that’s super easy, and that’s what you saw with the COVID Tracking Project and others where it’s like everyone’s suddenly using their graphs, because you know why? They’re really fricking good. I can name dozens of projects that have been out there that happened during COVID, and many are still taking place in very, very clever novel ways to fundamentally help people, especially internationally.

Tim Gasper:
I really like the way that you’re kind of setting this whole thing up because it connects to what you said also about things like peer review and getting other people to kind of look at the work that you’re doing and create checks on each other, and obviously using systems, and the scientific process and things like that. And if this is going to be a team sport, then we can’t become too infatuated with sort of just the… And this is going to almost sound mean to you, because maybe you are actually maybe one of the epitomes of this, don’t become too infatuated with sort of the data celebrity or something like that, and actually trust the process and the systems and sort of the balancing of each other out.

DJ Patil:
In fact, I think the best functional teams is it’s an everybody effort, everybody’s all in. I think a bunch of the talk that’s happened during COVID and the work that we’ve done during California, I got to tell you, there are amazing people that are there and I get far too much of the credit. I can just go down the list, Kit Rodolfa did so much of epidemiological modeling, along with Justin [Lasseter 00:20:06]’s team, Josh Wills, Sam Shah. The names go on. And then there’s the people who this is their day job, Susan Finelli, Marco over at California Department of Public Health, Mark [Galley 00:20:23], Mike Wilkening, Charity Dean. You can go on and on and on. And Amy Tong, who’s the CIO for California, Yolanda, who’s in charge of government operation. I can just go on and on and on. None of this would have worked out all of them. Every single one of them is critical, and those ones who are actual official public servants, they’re the ones who own this.

DJ Patil:
And by the way, they’re not hanging on their own. More people have been coming to them. Rick Clough, who is formerly at Google, is now the chief technology innovation officer for the State of California, and he’s in there making sure that the whole idea of these vaccines certifications work. And so this is a massive community, and that’s not to say how many people… Like Jason Vargo, or all these other people who fundamentally are also in academia, who have just shown up to do things. That’s the beauty of this. This is why this works. The reason the data science community is powerful is because of this quality. It is a quality that people are willing to use their skills and align to actually make something interesting happen, and we can show that interesting very quickly by either a graph, an analysis, a factoid, or building a product that somebody is going to use and interact with to help make smarter decisions faster.

Tim Gasper:
So this is very exciting this rise of the citizen data scientist and all the collaboration that’s happening in the community that’s made a lot of these insights possible around the pandemic. And then obviously as we mentioned here, ethics is important and how do we make sure that we’re using these skills for good. To open up that sort of topic a little bit with you, DJ, where is the world going with respect to ethics when it comes to data? Obviously, AI has become a very hot topic recently. Where do you see that going and how that ties a little bit to the pandemic as well?

DJ Patil:
Yeah. Well, I think there’s a few things that are going on. One, the world is catching up and then realizing that you fundamentally have to make sure the technology works for you, not against you, and there is far too many people where technology is not working for them. In fact, it’s quite hurting them. It’s amazing during the pandemic how many people just said, “Hey, just put these exposure notification apps on people’s phone, and it’ll take care of everything,” but people hadn’t thought through the different aspects of well, does everybody have a phone that can actually work with these notification protocols? Where does the data go? What happens when somebody lives in communal housing or high-density family environments when one person is infected? Do we have a pathway for them out? And what happens in 3D when you’re an apartment complex, and you’ve got notifications going that way?

DJ Patil:
A lot of these things are untested, and it’s oftentimes we want to look for the silver bullet with technology, and the more important thing which I tell people is you got to go live it, you got to live in people’s shoes to understand what’s going to happen. If you’re thinking about migrant workers in California, who are really many ways are essential workers for ensuring our supply chain of food inside the United States, what does it mean for them to have access to this technology? Does it work for them? As we think about the more easy talked about subjects, where we think about autonomous vehicles, like cars, or we’re seeing what happened with the 737 Max, all these type things we’re realizing, “Wait a second. Who’s got the algorithm? What’s the control?”

DJ Patil:
What I think was essential is a few things, and Hilary Mason, Mike Loukides and I wrote a free book that’s available for anybody who wants it, you can download it on Amazon, you can get it from GitHub. We wrote it specifically as practitioners, what do we think about? We think about it around a few principles that we have found to implement over time. What does it look like to have consent, clarity, control, consequences when something goes wrong? Fundamentally, why don’t we have a checklist when we release a data product like, “Who owns this algorithm? Who’s responsible for making sure it works?” We do checklists all the time, pilots do it, surgeons do it. Atul Gawande wrote a really famous book about it. We don’t do it as data scientists. What would that look like? What would be on that checklist? What if we had a whistleblower program inside companies?

DJ Patil:
What would it look like if we had a way where every person asked their employer when they’re interviewing, “How do they handle ethical dilemmas on data?” What if the flip happened? When we talk about cultural fit, what if we asked an interview question about ethics? What would this look like if every educational program had a core component integrated into it about ethics? We’d be so much better. The training that we have to have. And if you’re a data scientist out there and you haven’t been trained in ethics, you need to get trained because the consequences of what you do are so material to society, you are going to impact not only yourself, but your friends, your family and others, and you need to take that responsibility not just seriously, but it’s because you’re being trusted with the data, which is suddenly there is a version of the golden rule, which is please treat this data like it was your own or your own family’s data, and treat it with that level of respect.

Juan Sequeda:
Oh, there’s so many golden nuggets right now in everything you were saying. Let’s continue on the ethics one, on the checklist. I love this whole idea. Pilots have checklists. I put my car into service, and there’s a checklist of things we’re going to go through, right? We use checklists for everything. Where’s the checklist for data? I mean, I think that’s a-

Tim Gasper:
Even in the software world, right? Before we push code to production, it goes through the checklist.

Juan Sequeda:
Exactly. So that’s always a question about how do we treat data as a product, we have a checklist. And I want to go into the list of the things about the tech on the checklist, but let’s talk about the ethics. What’s on the ethical checklist? How do I know that my data product is ethical… I don’t know if that’s even a question I should be asking. How do I know that, and what’s the minimal things that need to be on that checklist? This is an important takeaway people will have.

DJ Patil:
So I think there’s a lot to learn from other fields about what does ethics look like, and I think from an ethics perspective and data, we’re still early on the journey. We’re far behind where we need to be. And if you kind of think about where many of these processes in biomedical research have come from, they all stem from the Nuremberg Code, which came from the Nuremberg Trial, which is really, why did it happen? Because we saw the atrocities by the Nazis and what does a safeguards look like? And so a lot of those things got put into place, and then what is consent, all these things. What is ours need to look like for data? Well, there’s parallel examples. There’s a lot of people who talk about fair, accountable, responsible algorithms, all these different things.

DJ Patil:
I don’t want to take away anything from that, but we have to start implementing things right now, and so some of the concrete things that we can do is adhere to what we call the five Cs as principles for that: consent, clarity, control consistency, consequences, these things that we can implement as sort of our model for what’s doing right with the data and how do we ensure that we start to minimize harm? How did we come up with them? Effectively, we did an exercise of applying the Nuremberg Code to data and saying, “Hey, what would that look like?”

DJ Patil:
That next one is we are going to need to start thinking about what does it mean to put institutional structures into our work and our workflows to ensure that we don’t make mistakes that cause tremendous harm? And some of those are those things of what would maybe an ombuds person, or something else inside a company if you have a question or something, maybe you have to raise something to an outside organization without fear of repercussion. Maybe it is just that checklist when you’re developing a product. It’s probably going to be somewhat all of them. One of the ones I think I would love to see is a commitment by every university, every MOOC or online training course, anything, have ethics integrated as part of their curriculum. It’s amazing. When you code, many times when you learn to code, at least the way I did is you’re doing database design or anything like that, no one checks if your structures are open to SQL injection attack. You get to a company, the first thing they’re doing is there are tests against that.

DJ Patil:
So what is our version to sure that we’re reducing bias or asking some very basic questions? What we’re proposing here is saying, “Hey, at least with a checklist, it forces you to pause.” Here’s a very concrete one: So just before you launch a serious analysis or you launch your product, go get some pizza and your favorite set of drinks, and sit down for an hour with your team and ask what could go wrong? Have a field day. Have a field day on all the things that go wrong. Just write down the list and then stack rank them. Stack rank them by risk, low, medium, high. Impact, low, medium, high. Do that, and now you got your two by two.

Juan Sequeda:
All right. We got concrete takeaway. So the five Cs: consent, clarity, consistency, control and consequences, and I love this example, just you’re done. You think you’re done, you’re celebrating. Go talk what can go wrong here, and then I think in addition to that is to whom? You got to start thinking about the personas, you have to get yourself in the shoes of the other people, and even people you weren’t even thinking about who could come up to that. So it’s like brainstorming not just what could go wrong, it’s who are the personas, what could go wrong with them? I think this is super critical.

Tim Gasper:
Yeah, not just thinking internally, right? Because I think so often as data people or as software people, we think like, “Oh, what could be wrong with my code, or the way I built it, or the way I architected it?” But actually thinking about impacts and interpretation, and the politics and the interpersonal relationships, and so on and so forth.

DJ Patil:
Can I give you a concrete example from my life?

Tim Gasper:
Sure. Absolutely.

DJ Patil:
So I remember when we were running the Precision Medicine Initiative, this is the idea to create tailored genomic treatments available for everybody, and one of the very clear instructions that President Obama gave us was, “You need to make sure you have real people at the table who are going to be impacted by these treatments.” And very early on, we went back to the president and said, “Okay, here’s who we’ve been talking to. Here are the conversations that we had with real people.” He said, “Well, if it’s represented by this group, this group.” He said, “That’s not real people. I expected real people to be at the table. So go back and do better.” Yes, Mr. President. So Francis Collins and myself, who’s head of National Institutes of Health and help decode humans. We had a meeting with a bunch of community members in Pittsburgh after one of our events, and we got everyone together, and we’re in this meeting and we’re going around, and everyone’s telling us their thoughts, different for Latino community, black community, all the different religious groups, everybody.

DJ Patil:
There’s a woman just sitting in the back just kind of listening with their arms crossed, and finally, we kind of go around and I said, “Ma’am, we haven’t heard from you,” and she’s like, “Do you really want to hear from me?” I said, “Ma’am, the president’s made it very clear. We’re here to listen.” And so she just goes off and is like, “Have you thought about this? This? This? This? This issue? What about for this community? What about this community?” And she just lit into us. And I remember coming out of the meeting and going, “Wow, we just got hammered there.” And I remember thinking like, “Can we hire her?” She just gave us a recipe. She knows this, we don’t. We don’t understand this. We don’t understand the perspective. We need her on the team to make us better so we’re not going to make any mistakes, and we’re going to do right by everyone’s data. [crosstalk 00:33:43] We need to find those…

Juan Sequeda:
How do we teach this? So let’s talk on the curriculums. This is something I’m very passionate about myself, personally, because we’ve been having these data science curriculums for the last 10 years, and what do we learn in data science? You can get a degree, masters in data science, and in 10 months, in a year, and you’ve learned a bunch of big data stuff and machine learning and statistics and now you’re a data scientist, and we’ve missed so much stuff. So what are the new things we need to go add concretely? What’s the message that we want? I’m going to send this snippet to every single professor I know in computer science. What is it?

DJ Patil:
Back of our book, we actually have a bunch of case studies that were produced by Princeton and Ed Felton’s team who actually have real ethical example case studies just like you kind of would a Harvard Business Review kind of thing, and we let people kind of go engage in them, and we say, “Discuss this with your team. Go do that.” That’s one part. The other part I think what we have to do is as we start thinking about this curriculum, we have to ask ourselves, “How does every question have some ethical aspect?” Just like we need every question to have some security aspect because think about what’s going on in security these days with polls being blown open and code, and all these other issues. We’re not where we need to be thinking about that equally as much as the development process. We need to be thinking about how this can be abused, what does this look like? And those things are going to be critical.

DJ Patil:
There’s example after example I can give in other areas where we’ve seen flaws, and then what happens is we get together and we learn from it. The aviation industry is the top at this, but they’re also in their panels after hospital has an issue to learn and improve off that. That’s why the checklist gets smarter. But we’ve seen other examples where this has failed catastrophically also. One of them is in Vicodin and other things where somebody didn’t think that, “Hey. Well, what happens if the tablet is crushable? Could then somebody use it in a different form?” And so they had to innovate a different pill design to prevent that from happening. This is why design is so critical alongside with data, and oftentimes, we’re not thinking about the overall design aspects of these problems, and we need to bring them in, the designers, into this process to make those better decisions.

Juan Sequeda:
Oh, wow. All right. I want to go transition to another topic, which is what’s next on the data jobs? We have data science that’s been around for a decade, we’ve now realized we need to kind of upgrade that with the ethics. But what’s next? What are the next jobs that don’t exist today? I always think about this. What was going on in your environment in 2007/8, when you’re at LinkedIn to come up with data science? We’re now in 2021. What’s going on now that in five years it’s like, “That’s the next sexiest job.” I know I’m kind of asking you to predict the future, but you’re kind of making it here at the same time.

DJ Patil:
Well, I think one of the things that’s most exciting is the amount of jobs that are going to come up that haven’t been invented yet. The fact that there’s so many different types of data roles that are beneficial, whether it’s data steward, data engineer, data visualization architect, there all there’s all sorts of things. I think there’s a couple things that we’ve seen, and just over this time, as new industries come up, new functions come up, important roles, people have a label for what they do and how they can be successful in accomplishing their mission. One of those is the creation of why there is a White House Chief Data Scientist, and the importance of that role. And now there are chief data officers and chief data scientists as part of the law and every part of the federal government. Same way we’ve had people who are now social media managers. That wasn’t the thing before social media. Web design wasn’t a thing 20 years ago, 30 years ago. Now it’s a thing. Now then you’ve got these specializations, like mobile web design, or you got virtual reality.

DJ Patil:
These components come together, and then what we’re going to see is further specialization of data science, you’re going to be more bio. Just like there’s biophysics, biochemistry, there’s other genres in there. There’s one thing I think that’s going to be important for us to take away is that we have titles and labels that both help people do their job, but minimize putting people in a box because all too often, if you have a name or a label and it’s in a box, you don’t get to have context to do your job. You don’t get to have context how to do stuff. And that’s the fundamental reason why data scientist has taken off because no one knows what the hell it fully really means, and so because of that, everyone’s like, “Oh, you’re smart. Great! You can be here. Let’s give you context. Let’s have you solve our problem,” and then you solve the problem because you have context. It’s a self-fulfilling prophecy.

Tim Gasper:
In that sense, the vague title can sometimes actually be to its own benefit.

DJ Patil:
That’s right. I mean, the last thing I think when we were coming up with the titles to figure this out like, “What we’re going to call ourselves?” Last thing we were trying to do is name something. Literally the reason we came up was Jeff and I were trying to get HR off our backs from… This Jeff Hammerbacher. He was at Facebook and I was at LinkedIn trying to get the HR off our backs because we had too many different job titles. The beauty of it, I think, is what Monica Rogati really figured out. She said, “Well, we’re LinkedIn. Let’s just go post all the jobs with all the different titles and let’s see who applies, and then we’ll just data science our way into this.”

Juan Sequeda:
I am curious, what was the title before? What was the title originally?

DJ Patil:
Like what was on the list? Research scientist, analyst, data analyst. Pete Skomoroch had this really cool idea about data artists because we create with a palette of data, and I was like… That was like [crosstalk 00:40:09]. Maybe a lot of people will apply to that. So we had all these different things, but the one that stood out overall is data scientist, and trust me, when we first saw this, I think the results kind of coming back in were like, “Isn’t that redundant?” And we’re like, “Well, that’s what people were responding to,” and so we were just naming something that was already there that was in people’s desire, and so we were just giving a name for it and a role, and the reason I think it became popular is not only because the ambiguity, but because people then suddenly saw, “Wow, you can actually build something with data.”

DJ Patil:
When I first moved out to Silicon Valley, I had this idea that you could do something with data. People were like, “Yeah, probably. Not really that interested.” And I was pretty well connected out here because I grew up here. Sergey Brin’s dad was on my thesis committee, Sergey’s mom was on research grants with me. I knew a lot of these researchers from my academic time and government work, but people couldn’t grok it until we actually started to build things with data in these companies, and people said, “Oh, that’s the superpower that you’re talking about. I get it.”

Juan Sequeda:
I do have to acknowledge that one of the things that worries me, it does literally worry me, is that whole title. If we get into titles and the labels as a scientist, and as a scientist too, I’m like, “Are you people really doing science? Are we actually following the scientific method or not? Are we’re just going to… I learned a bunch of techniques and a bunch of API’s and I’m going to go do this stuff, and now I’m doing science.” And no, you’re not. You’ve learned a lot of things and you’re applying things, but I mean, you have to be that very curious… I mean, you’re doing science, you’re taking something that’s unknown and making it known. So that’s something I’ve always been kind of… I have a pause on if you’re a data scientist, and even the whole citizen data scientist is… Ideally, I get it where it comes from an ideal point of view, it kind of scares me that it’s opening up some sort of Pandora’s box, everybody starts using this label and you’re like, “No.” I mean, the word scientist actually has a very fixed meaning.

Tim Gasper:
If everyone a scientist, does it water down the meaning of it?

DJ Patil:
Yeah. Well, I think we’ve seen this over time, over time again, which is there’s lots of fields where this happened, whether it was architect or surgeon, or even doctor during Sherlock Holmes’ this time period. Doctor was used as a term because people were afraid. If they needed a co-optive from religious groups and the doctor philosophy because they were like, “Well, we kill a lot of people and that’s not good. So let’s try to co-op that term.” I think it’s a fair question of what is rigor look like, what does credentialing look like, what are those different aspects? And I think that’s going to take place over time.

DJ Patil:
The more important thing, I think, which we’re going to see is what does it mean to become a scientist? It’s through training. It’s through apprenticeship. The same is true with code. The same is true with analysis. You get good because you’ve worked on a team with good people and you’ve added value, and you train up, and that is fundamentally part of the scientific method equation that we don’t really talk about is that people get good at science because they have a pathway in to those organizations and they apprentice. What’s our model for apprentices? And for far too long on this we’ve isolated people out and not let all types of people in. We need to change that.

Juan Sequeda:
Yeah, my PhD advisor told me early on, “Remember, science is a social process. It’s not like somebody magically came up with something and we’re going to go believe them, right? No. If I come up with something, I need to go convince my peers,” and that’s why we have a peer review process, and they disagree, and I have to go back, and we have to go back and forth until suddenly everybody around me are like, “Oh. Wow. We are now in agreement, and before we did not know anything about it.” Guess what? That’s science. We took something that was unknown and made it known, and this is just training. And again, it’s being around the people who are pushing you to be better and I think that’s why it’s quality, quality, quality, and it’s a social process. You cannot be alone.

DJ Patil:
Think about code review. How many of us had our ass handed to us in code review? And you’re like, “Oh. That’s what good code looks like. Wow. The elegance of it.” But that’s also in open-source projects why we have a committer model. We have the infrastructure in there. We have a number of fundamental problems here in data that we have not figured out how to address. We know cleaning is really hard. We know collaboration is really hard. The data dictionary and documentation is a challenge, and what we need to figure out is how do we start to move these pieces of infrastructure in place so that we can build on them. And I think one of the flaws we try to do is we try to directly replicate what we see in code and that just isn’t quite amenable for data. We honestly have to invent this.

Juan Sequeda:
SO kind of the start transitioning here a little bit to the end, I want to kind of make… We’ve had all this discussions about the citizen data scientist, the pandemic, and everything. Let’s connect this a little bit kind of to the enterprise, takeaways, and a question I have is, is the enterprise in this consistent pandemic disaster state, or how… Because you can always think like, “Well, I need to go…” There’s something always a problem, right? So are we always living in a pandemic, in a disaster state? And everything we’ve talked about, we can actually transition and apply it within enterprises? What are your thoughts?

DJ Patil:
Yeah, so there’s a few things I think. Inevitably an enterprise, one, as data scientists generically speaking, we’re very poor at storytelling or advocating for what we need and what we can do, and our value proposition. It’s taken so long for, I think, data scientists to really be effective in the front office. I think one of the other things that we’ve seen that’s there is, and it’s oftentimes when I’m teaching data scientists, it’s like, “What’s the three things that we do when we show data? What do I want you to take away, what action do I want you to take, and how do I want you to feel,” and as scientists, a lot of times we remove that, how do we want you to feel, but in the enterprise, we’re trying to create action, we’re trying to actually move the needle and get people aligned or other types of things in place.

DJ Patil:
The other one that’s in there is it’s a version of the rent’s too damn high, which is, I talk to executives and I say, “Let me guess, your data scientists start their day… You’re very frustrated with why don’t you see more productivity to them.” And I said, “Let me guess how the day starts… It starts with, ‘Hey, there’s something wrong with the dashboard,’ and then that’s like a five-alarm fire. They’re running around all the way till 1:00 PM, and they found that somebody suddenly accidentally changed some code and they’re like, ‘Oh, sorry. You want me to change it back? Cool, I can do that.’ And then now, they’re supposed to start their day at 1:00 PM after a fire drill where their adrenaline’s been running around and everyone’s going crazy. And now they’re supposed to be productive? They’re just trying to buy time to the next fire drill, which is in 10 hours.”

DJ Patil:
So you see these cycles, and we haven’t figured out how to articulate what are the different functional pieces without getting caught into the whole ROI conversation of, “Oh, that’s going to be a team of 50 people, or that’s a team… We don’t have those people. We can’t afford that.” And it’s like either you’re doing data or you’re not, and that that dichotomy is what the good organizations are jumping over that and go, “Oh, to play ball here or to appropriately win at this, we have to be all in at making this work.”

Juan Sequeda:
So the data scientists end up being what you called before, the digital first responders.

DJ Patil:
Yeah, they’re firefighters inside of the organization and they’re exhausted.

Tim Gasper:
Yeah, and in software organizations, if I even think of data.world, we have people who are sort of infrastructural and more ops focused people that are helping to be those first responders in many cases, and they bring in obviously the experts as they need, but there’s an interesting question here, if somebody is always in a first responder mode, are they really then able to invest the time and attention that they need to on the longer running projects? You don’t want them to have zero of that, because obviously that’s real world expertise and experience, at the same time, if it’s 50, 60, 70% of what they’re doing, it’s a very different mindset.

DJ Patil:
They oftentimes have 15 different masters, and all of them are piss, and yet none of them are willing to actually make the investments as required to do this. I mean, this is fundamentally why the role of the chief data scientist was created is somebody in that role to shepherd the White House and provide advice to the President and the presidency to figure out what and how to invest, and where are you going to make sure that data is utilized correctly, how are you going to ensure that you’re minimizing harm, and what’s the investments fundamentally needed to return on the continued economic value that we’re creating and the data is being created from. We need to keep investing on that.

Juan Sequeda:
Yeah, that’s why I think we need to start treating data as a product and organizing by domains, otherwise there’s like, “Who owns this? Where’s the checklist?”, and so forth.

Tim Gasper:
Yeah. It’s a really good thing that the CDO has become such an important role now because now it’s leveled up to that strategic level.

Juan Sequeda:
So this has been a phenomenal conversation. Want to kind of start taking towards our takeaways, but first, we got a quick lightning round. We got four questions for you, DJ. So yes or no answers. All right, let’s do this.

DJ Patil:
Let’s do it.

Juan Sequeda:
All right. So has the pandemic made the lack of data literacy even more painful?

DJ Patil:
It’s made it more complex. It’s made more complex. In some places, it’s amazing how effective somebody is willing to take data, and other places where people just cannot digest it.

Tim Gasper:
Interesting. And I’m actually not supposed to say the word interesting, but I said it anyway. I say it too much. Will citizen data scientists outnumber professional data scientists?

DJ Patil:
I’m going to go with yes, if we do this properly, because that means that we have brought the full force of all data scientists to the world’s problems. It’s like the old days of SETI, where you took all the spare cycles, and you’re like, “Hey, let’s do something interesting, or protein folding problems.” What would that look like if we had every data scientist spending a little bit of their cycles to find a cure for cancer? What would that look like? That’d be pretty damn awesome.

Juan Sequeda:
All right. Will data and AI ethics become a requirement of data-related curriculums?

DJ Patil:
Sure as hell hope so. That’s a yes.

Juan Sequeda:
You hope so.

Tim Gasper:
Good answer. I like that.

DJ Patil:
If we don’t, it will be regulated massively and it will be implemented through other mechanisms.

Tim Gasper:
Okay, final lightning round question here: If you had a chance to be the US Chief Data Scientist again, would you do it?

DJ Patil:
I think it’s time for somebody else to do it, honestly, very generally. These roles are done best when we bring new people in. I’ve had an amazing opportunity to serve. I’m grateful for it, and I hope we’ll have many, many more chief data scientists. I know that the team has been thinking, the Biden team’s been working very hard on it and I’m very excited about there being many, many more chief data scientists. I’ll be very happy when I’m the 100th chief data scientist. There’ll be the next 100. That’ll be awesome.

Tim Gasper:
A legacy of chief data scientists of the US.

DJ Patil:
That’s right.

Juan Sequeda:
All right. So we got our final segment here, our TTT, Time Takes It Away with Takeaways. Go first, Tim.

Tim Gasper:
All right. I mean, so many nuggets here. I’ll just hit a couple. I love our conversation about ethics. I love that you brought up your five Cs, consent, clarity, consistency, control and consequences. I think that’s a really interesting framework to think about things, and obviously, it relates to this question that you asked, which is when you launch a new analysis or data product, you should ask what could go wrong and think about those things better on those five Cs, and then stack rank it. What’s the highest risk, what’s the highest impact? And I love the comment that you made about you get good because you work with and learn from good people. The value of apprenticeship, the value of the social learning process, and the fact that really, this is a team sport. So love that. Juan, what about you? What were your takeaways?

Juan Sequeda:
Well, first of all, the whole definition of a citizen data scientist. This isn’t your day job, but you have the skills needed to push data analysis forward, and that’s a key one. We always talk about data as a team sport and it’s a collaborative team effort. We’ve talked a lot about that, but I really liked about what you said is, “Let’s go rotate people. You need other people in there.” I think that’s important right there, and always remember that science is a social process. If it’s a team sport, science is a socials process. The checklists. I mean, this is something I think… One takeaway, please have checklists. We have checklists for all many things. We should have a checklist for our data. Let’s go start defining what that checklist is, and ethics need to be part of that.

Juan Sequeda:
For the curriculums, what are the kind of low-hanging fruit for data curriculums. Let’s go study case studies. Let’s go read them, and they’re out there. I think that’s phenomenal to start, easy way to go do that. The same way we question things about security, we should always question these about ethics. The next jobs, very specifically, specializations in data, so there are going to be a bio-data scientist and so forth. And there’s no need to kind of box yourself, right? Having that vague title can be very beneficial. And then something you said very quickly, which I liked was what’s hard with data? Cleaning, collaboration and documentation. So how about that for paying attention to everything, all the amazing nuggets you had today?

DJ Patil:
Pretty good. Don’t I get a takeaway? I feel like I should get a takeaway here.

Juan Sequeda:
Yes, you do.

DJ Patil:
All right. My takeaway is one of the most important ones that I have one, and I hope you all who are listening or watching consider is we oftentimes think of where we can apply our skills in so many things, you guys, general talent has the ability to do so many remarkable things with data. Just go down the street and see who needs help, food pantries, homeless shelters, could be working on COVID, could be working on something else. You have an amazing set of skills to do so many things and I just hope you’ll think more broadly about the way you can use your skills and data to have an impact on the world.

Juan Sequeda:
So with that, I want to throw it back to you with some advice. So two questions, what’s your advice? Very broad, open question about anything, about data, about life, whatever. And second, who should we invite next?

DJ Patil:
Yeah. Well, the first is: I think the thing I would recommend is, the advice I have is just work with amazing people, and if you’re not working with people who are amazing, move on, honestly. Move on. And what do I mean by amazing? People are going to kick your ass every day, people are going to laugh with you, make you cry, make you so much better. And in a year, just like, “How could I have lived without these people?” Find those people. It’ll be hard. It’ll be intense, but find those people. They will make you so much better. And then who next to have? I’m going to go with the authors of my book. So I guess I’m just going to go with my track record of breaking rules and saying more things here. I’m going to recommend Tara and Hana, and Jer because I think they’re phenomenal people working on [inaudible 00:57:07], and they don’t get enough credit.

Juan Sequeda:
Awesome. DJ, this was a pleasure. Thank you so much. We learned so much from you and always it’s a pleasure to just listen to you, and I feel very privileged we had this time to go chat today.

Tim Gasper:
Agreed. Thank you so much, DJ.

DJ Patil:
Thanks for having me.

Juan Sequeda:
All right. Cheers, everybody.

Tim Gasper:
Cheers.

Enter Content Here.