Sure, it’s fun to chat with data executives and thought leaders, but this season, we are focusing on the people who roll up their sleeves and get the data work done. And there’s no better place for a podcast like ours to start than with a data practitioner at Drizly.
Juan Sequeda and Tim Gasper were joined on Catalog & Cocktials by Special guest, Emily Hawkins, Analytics and Data platform lead at Drizly, to share how the data team and the data stack were established and how the e-commerce platform is evolving following the company’s acquisition by Uber. Below are a few questions excerpted and lightly edited from the show.
Juan Sequeda: Honest, no BS, what the heck is analytics engineering?
Emily Hawkins:
That's a great question. It's a lot of things. I started as a BI analytics and morphed into this analytics engineer role over the past two-and-a-half years at Drizly, and I feel like what made it different for me is really following those software engineering standards, learning Git, learning the command line, learning data modeling, learning dbt, all of those things came together, and I think that's really what makes this role so interesting, powerful and empowering to the people who are in it and moving into it from all different backgrounds. It's great because you don't need a specific background to become an analytics engineer. Anyone who wants to do it, can do it.
Tim Gasper: How was that transformation into this role for you? Would you call yourself an analytics engineer today?
Emily:
I started at Drizly a little over two-and-a-half years ago, so in 2019. Previously, I had always kind of been in data and analytics in some capacity, really since my first internship in college, but it was more on the business analyst side, and then I started at Drizly as a BI analyst. I was the second BI analytics on the team. We were small, so we had to do a lot, and really, we knew that we wanted to fix how analytics worked.
We weren't happy with our current tools, and our director came in and he really helped us carve out that time and roadmap and buy-in from the rest of company to help us spend that time actually improving our stack and bringing on dbt, bringing on Snowflake, bringing on Fivetran, bringing on Looker, all of these tools you hear so much about. Having that experience, learning them as we were implementing them, and figuring out, okay, what's the best way to use these tools together? What are the best practices we should be following? Of course, we learned a lot from the dbt community in all of the great content they put out. But yeah, I guess just doing it, that's really how I got to that point, I guess.
I think the first maybe year-and-a-half at Drizly, I was more in that analytics engineering role because I was working directly with business stakeholders modeling the data, getting it in our BI tools, but now I more focus on our data platform as a whole. I still do some work with data directly, but not as my full-time role now. So, I just manage our data platform, our data ecosystem, work with our BI analysts, our data scientists on how to best work with our tools, and how to make their work experience with our stack better for them.
Juan: Since moving into this role, how is your modern data stack evolving?
Emily:
So, what we started with was dbt. Our first step, was to move all of this bespoke SQL logic in Domo into dbt. So, it's version controlled. We know where data is moving. We know how it's connected. So, we did that. We got Fivetran so we could take all of those connections from Domo into Fivetran, get Salesforce in there, Zendesk, MySQL, all of the things that we use at Drizly. So, we use a lot of those third-party apps, so we have a lot of connectors that we need data from. Of course, Snowflake. I think it was about a year ago, maybe, that we started using Dagster for data orchestration, and I think it was also about a year ago, maybe a little more than a year ago that we started transitioning all of our data visualization into Looker.
Juan and Tim: That truly correlates with what everybody has been talking about. We're calling this the “modern data stack core”... So, how has that evolution been going over time and are there tools/technologies you’re particularly excited about?
Emily:
Yeah. It's honestly been not too long since I started at Drizly, so probably within my first six months at Drizly we started this process, and I don't think we're ever going to be done, done. There's always new things to do and check out and improve upon. But yeah, I guess I'd say two years, two-and-a-half years. Our most recent addition to our stack was Sensys for reverse ETL. I mean, that's been around a year now maybe?
There's no timeline in my head, but we started using that to move data into Salesforce, specifically for our sales ops teams, and I think it's improved things for them because they don't have to go into Looker to find some information about a retailer, for example, that they're trying to onboard to Drizly. We can input that data in Salesforce for them from a dbt model, and they can have that right there in the tool that they're working in. So, I think it's helped them a lot just having all of the data and information they need in one place, in the place that they're doing their job.
The other newest area is our streaming stack, which we're also using dbt for in Materialize as our database. So, that's also in its infancy, but we're really excited about the possibilities with those two tools working together.
We have it currently connected to Kafka, so we have data coming through Kafka topics. We have a very limited use case right now, but a couple types of events that we care about in our event stream, we're able to hook those up in Materialize so that it's constantly bringing in new data as it comes into the Kafka topic, and then the dbt models that we set up, they're materialized views, so they're just constantly updating when new data comes in. So, you don't have to be dbt running every five seconds. You just run it once, and it's constantly checking for that new data that's coming in.
I think we're about 20 something people now on the analytics (which includes BI and data science) team. We're adding so many new people, it's hard to keep up. Yeah. So, our analytics team's 20-ish, so that includes BI and data science, and then for reference, Drizly overall, I think we're around 300 something. Of course, that doesn't include Uber. So, within BI, we each focus on a business vertical, so there's a product BI, ops, marketing. I guess marketing's a special case because they have their own set of analysts, data engineers, and data scientists, because it's such a specialized field, and then similar setup on the data science side we have a production data science team with a machine learning engineer, we have data scientists who focus on marketing, who focus on ops, focus on different parts of the business, and then my role is data platform. So, as I was saying earlier, just how it all fits together, how we can all work together on the same stack.
Juan Sequeda: For folks who are just starting out on this self-service journey, if you go back and say, "I wish I knew this. I wouldn't have done this," and so forth, what are those things that come to mind?
Emily:
Yeah. For me personally, I wish I spent more time learning about data modeling, so that's something we've definitely had to backtrack on a little bit with our dbt project, just because of the way our data was initially set up in Domo. It was just very wide tables, and we've had to really backtrack on that and make the conscious decision to move into that more dimensional modeling paradigm. It's actually been great. I'm really glad we made that decision, but it would've been nice if we started out that way.
Key takeaways
- Analytics Engineering: data following software standards: git, modeling, peer review
- dbt community is cool!
- Tip: spend time on data modeling.
Visit Catalog & Cocktails to listen to the full episode with Emily. And check out other episodes you might have missed.