2022 is here!
If the last two years are any indicator, maybe we’ll end up calling this the “decade of data,” with next gen data observability, data catalog, data integration platforms, cloud data warehouses, and more making big news and bringing in big funds.
We’ve seen the rise of dbt and the analytics engineer, a focus on data fluency and dataops, and AI finally trending from the fantastical to the practical. Where the late ‘10s focused a lot on modern business intelligence, data science, and a continued shift to the cloud, the ‘20s have been about managing and transforming the underlying data.
So what does 2022 hold in store? If you’ve been following our web show and podcast, Catalog & Cocktails, we’re talking to leaders in the data and analytics space. And between what they’ve shared and our own thoughts, here are the biggest trends:
1. Data, meet metadata
As the cloud, SaaS, and the modern data stack expand their dominance, the data landscape is getting more and more complex. There was a day where you could analyze your stored procedures and your Informatica or Microsoft ETL configurations to get a lot of visibility. But now, with Python scripts, Dagster pipelines, dbt, microservices, streaming, data lakes, data lake houses... visibility has become scattered and difficult to piece together.
Metadata platforms, led by catalogs and observability solutions, will continue to grow in popularity. Leaders will be marked by speed to implement, ease of use, and adoption by a broad set of data personas, intelligence, openness, and interoperability.
2. Knowledge graph gets hyped
Machine learning has become synonymous with AI, powering intelligent use cases from image recognition, better classification, recommendation engines, automatic finance, smart cities, cars, homes, and more. But what machine learning has lacked is nuance, context, and explainability.
Why does Facebook have such a powerful news feed? Why do Apple and Amazon have such impressive voice recognition? Why does Google nail search results, and Netflix show recommendations? Because under the hood is a knowledge graph. And it infuses data, context, semantics, and relationships together in a queryable, analyzable web.
In the future, AI will have common sense and be built on the back of knowledge graphs. And virtually every company and service will incorporate one knowledge graph or many. But before we get too far ahead of ourselves, there will be a lot of hype. Which companies are actually built on a knowledge graph and able to tap into their context? Many fewer than will claim.
For data management, some machine learning features are impactful, but many simply skim the surface and are cosmetic in nature. A real knowledge graph means you'll be able to operationalize your data and analytics assets with meaningful automation.
3. Fewer data marts, more data apps
dbt has taken the world by storm, empowering analysts and data engineers alike to leverage versionability, testability, reusability, reproducibility, and a declarative approach to data transformation. And dbt has plans to cover more of the data translation layer with their recent announcement of the metrics layer at dbt Coalesce. (For more on the metrics layer, follow Benn Stancil’s fantastic substack.) If you’ve heard of the idea of headless BI, this will scratch that itch.
Whether you use dbt or a headless BI platform, this trend leads to downstream data applications that can be rapidly deployed and are much more nimble, lightweight, and actionable. That means less business logic embedded in data marts, cubes, or directly in the BI or analytics tool, and more time focused on using data to make better decisions.
See Sisu for an example of a company driving towards smart insights. See ThoughtSpot for a compelling take on fewer dashboards and more embedded answers. See Eppo for data experimentation. And see Lightdash for an example of simpler, dbt-friendly, code-defined dashboards.
4. ‘Mo code AND no code
We mentioned that dbt really demonstrated the value of an as-code approach to data transformation. Tools are diverging in two directions — more code AND no code. It’s really not a battle of which will win, but rather of which you prefer for your use case and your technical level.
No code tools provide a low barrier to entry and fast time to value. As-code tools provide versioning, testability, a declarative approach, and more. And the best no-code tools will sit on top of as-code frameworks to get the best of both worlds together.
For more on the role code plays and the right data user experience, check out:
- Cindi Howson (ThoughtSpot) on the importance of low-code/no-code for analytics
- Tejas Manowar (Hightouch) on low-code/no-code UX for reverse ETL integration
- Erik Bernhardsson on data tools: the good, the bad, and the ugly
- Nick Schrock (Elementl / Dagster) on modern data stack and orchestration
5. Data mesh rolls up its sleeves
A top trends list wouldn’t be complete without mention of data mesh. It was quite possibly the hottest topic of 2021 in data management, and perhaps also the most polarizing.
- Data as a product? YES!
- Self-service infrastructure? HECK YES!
- Domain-driven ownership? YES I THINK? LET ME LOOK THAT UP.
- Data product architecture quantum? HUH?
While there will continue to be detractors of data mesh with fair reasons to be pessimistic, with scrutiny comes pragmatism. We predict data mesh is going to quickly pass through the hype phase with a practical approach in hand. In 2022, we’ll see case studies on how companies are incrementally implementing the parts of data mesh that are fit for purpose, figuring out what a good data product really looks like, and codifying federated computational governance in an agile, bottom-up way.
What do you think?
Do you agree that the above will be among the data management trends for 2022? Join our honest, no BS conversations about enterprise data management with data leaders and practitioners by subscribing to our podcast Catalog and Cocktails.