We all strive to be data-driven. And yet we all instinctively know that we’re not very good at it. In fact, if you believe any number of recent surveys, it seems we may actually be getting worse at data. One reason for this is a misalignment on what success looks like. How do you define and measure it? What is the actual value of your data?
Tim and Juan were joined by Lars Albertsson, founder of Scling, on an episode of Catalog & Cocktails to talk about data productivity and balancing a technology vs product-driven approach to data. Below are a few questions excerpted and lightly edited from the podcast.
Juan Sequeda: Honest, no-BS, why aren't we able to extract value out of our data?
Well, I think we are easily distracted by technology and very rarely the technology matters. In some cases it does, but in most cases, it doesn't. So there's a huge gap in the capability to extract value from data, and if you look at the data leading companies of the world where I've had the fortune to work at a couple of them, they are decades ahead essentially in getting value after the data.
I worked for Google, I worked for Spotify and that's sort of where I saw how much value you can get. And then I've spent a number of years, various considerations helping non-leaders get value from their data. And it's never about the technology, that's never blocking them, right? It's always the ways of working, the ways you organize, the collaboration patterns, the rituals. Many of the companies have tons of rituals that they cling to and they can get rid of.
Tim Gasper: Is industrialization of the data process a good thing, is that where we want to go?
So the equivalent of, "I'm hungry right now. So I need to flip a burger on the stove," is, "I need this data now to make a decision, so I'll pull up my spreadsheet." That's sort of the most primitive tool that we have nowadays. Whereas, the industrialized version is people in the organization, tend to need this of data on a regular basis, so we will glue together from existing components in AB testing framework so that we can, without doing the query, the data warehouse or the spreadsheet query each time have the right decision presented or the information for the decision presented in front of us, as soon as we have thrown out the sort of AB test to users.
So, I would say the analogy between being hungry, wanting the data now and working on the process to improve, not for me right now but for the next person, and next person, and next person that wants the data. So it's to some degree automation, but it's also automation beyond just rescheduling the same query, it's automation where you continue to iterate and improve on the process. So whenever something goes wrong, you add a bit of more process to make sure that your data quality is measured or whatever. And here we come into sort of the DataOps practices, which is essentially the equivalent of lean but in a data factory setting.
So we're talking a little bit about the process and the evolution and the maturity. Let's talk a little bit more about the hamburgers themselves, the content that's being cooked here. People say things like, "Hey, we want to be more data driven." But as you just mentioned, a lot of times we're trying to do these use cases, we're trying to drive value with the data in more specific ways, but we say things like be more data driven. What does that really mean?
What does it mean to be data driven and what are we actually trying to cook here that is valuable?
Well, data is used in three major ways to sort of enhance your business.
One is, being data informed, which is, you manage to get the data that you need for your human decisions, right? Business insights, product insights, and so forth, so that you make better decisions at a higher, low level in the company.
And the second one is sort of data fed products where the data is part of the product that you provide. And this can be top lists, if you're a media company, it can be reports assembled and sent off to partners because you have signed a contract that you are supposed to provide analytics to your partners and so forth. Where the logic is straightforward, but data is part of the outcome.
And then you have machine learning where your logic is not complete, but you need data to sort of refine the logic because you assess that it will be a better result than if we humans create all of the logic ourselves.
Tim: What trends are you seeing or recommendations do you have around what's making these data teams be more mature in creating value?
So these principles are like immutability and democratization and homogeneous environments are some fundamental factors of success. So the late adopters of this sort of big data technology were never forced into these successful patterns of working and all these successful patterns of working can be summarized as a data factor, which is sort of the fundamental of industrial data processing. So I think that's why we saw so many failed big data projects in the 2015 era, right? All of these companies adopted the technology, they tried to push the technology into their old ways of working and then you had just had the worst of both worlds, whether your vendors helped with implementing transactions and SQL support or not. The real value lies not in the technology on the new shiny things, but in the ways that you work, and the way that you enable sharing of data throughout the organization and innovating on low friction innovation on top of your data.
Juan: Should data value be defined by data leadership?
No. I have an excellent example here. Back in 2013, when I joined Spotify, one of the first things we did was to make an effort to democratize data. And we set out the goal to democratize it for any team with a developer essentially. And that was a transformation to what is today known as DataOps, but the word didn't exist at that time. And we managed to push down the friction of creating new pipelines so that a beginner could do it in less than a day.
And that brought down the friction significantly and the number of jobs just skyrocketed afterwards. 18 months later, a team of engineers took a hack week and then another week or two, and they built “Discover Weekly,” which is now one of the most popular features of Spotify. Arguably the most successful machine learning feature ever built in Europe. And I became really proud when I heard their presentation, because they said that we could do this, not because the company had decided at that board level or at management level, that yes, we should make an effort to spend half a year and 20 engineers, but the company had enabled bottom up innovation, enabled us with all of the data and the ways to build pipelines and to serve playlists and so forth. And Daniel Ek, the CEO, said, "I didn't see the beauty of it, if it was up to me, I would've killed the project, but Spotify doesn't work that way. So I just didn't give them any more resources, but then they launched anyway and in a year they had 40 million active users for the product." So he was clearly wrong in his definition of data value and if it was up to the leadership, we never would've seen that product.
- Empowered innovation is key to data value
- Democratization happened by accident with a very flat structure
- Immutability ended up being an important best practice to ensure repeatability
Visit Catalog & Cocktails to listen to the full episode with Lars. And check out other episodes you might have missed.