They’re not what you might think...
When Catalog and Cocktails hosts Tim Gasper and Juan Sequeda invited Ergest Xheblati, Lead Data Architect at EverQuote, to be their guest on the penultimate episode of the podcast’s third season, the conversation took an unexpected turn.
The show began with a discussion of how data engineers and architects often face similar frustrations when building data models. Specifically, Ergest called out that data modeling consumes a significant amount of a data engineer’s time and attention, and the sheer volume of data models created by data-focused organizations eats up valuable space in a data warehouse.
In the ‘modern data stack,’ Ergest observed, the phrase “data models” really means an ad hoc, one-off reporting dataset built to serve a single, very specific purpose. Generally, every time an engineer needs to answer a business question, they start from scratch building their own new model. Build enough of these single-use data models, and soon your warehouse is crowded with duplicate logic and ‘swamped’ with countless tables, making it impossible for data pros and business users to find what they need. So, they build a new model… and the cycle repeats itself ad nauseum.The result: a loop of creating custom models on top of custom models, leading to terrible data product performance, wasted cloud compute resources, longer times for models to complete, and massive inefficiency.
“Your data warehouse is like a beautiful garden,” said Ergest. “But all these ad hoc models are like people planting all over the place, and eventually you get weeds everywhere. Then, when anyone new comes in, they can’t find what they’re looking for. This leads to more planting because you can’t find what you need, and the garden gets even messier.”
Ergest solution? Instead of these one-off data models — built because it’s assumed we need new models for specific data sets — engineers should focus on modeling the business in its entirety. The future of data modeling is a full-business-encompassing data model: attributes, entities, relationships, and what they mean to the business.
“It’s Not Really about the Data”
The C&C crew also spoke about how data engineers should begin building their models, and the most important skills an engineer needs to be truly good at their job.
It wasn’t coding. It was data warehousing. It wasn’t data analytics.
The two most important skills Juan, Tim, and Ergest identified were empathy and curiosity.
“I get a kick out of helping business stakeholders improve their productivity,” said Ergest. “And how do you know what’s important to business users? Empathy; You put yourself in their shoes.”
“It’s not really about data; data work is based on technical skillets that you learn. They’re a technical necessity, yes. But at the end of the day, the tools and techniques can all change. SQL has been around for 50 years, but tomorrow it could be superseded by something new and better.”
To empathize with your business stakeholders, data engineers need to sit down with them and ask them questions. You need to learn how the business works, to become business literate. Ask the unit VP, “What does your workflow look like?” Ask the data analysts, “What questions are you hoping to answer? What use case are you trying to solve for?”
To build the best data models, to answer your stakeholders’ questions, you need to talk to the people in your business. You need to be curious, to ask questions, to learn what they care about and why. The most important skills for data engineers are interpersonal. When you’ve mastered those skills, then — and only then — can you truly add value.
Businesses, agreed the crew, are built by people, run by people, and founded to help people. And that’s why the best data models are built by engineers who can empathize, ask questions of, and successfully work with data. And with people.