One of the very first conversations I had with my co-founders – long before the advent of ChatGPT or even data catalogs for that matter – was about data democratization. Our company’s mission has always been to build the most meaningful, abundant, and collaborative data resource in the world. Doing that means providing two things:
Providing an easy to navigate portal where anyone can access the world’s open data
Offering tools for data users to explore and collaborate
I look back on those discussions from nearly a decade ago and am amazed at where we are today. We are the most used data catalog in the world with more than 2.1 million users – and growing. Our patented knowledge graph architecture allows us to scale far greater than anyone else in our space. Knowledge graphs are the most powerful data technology ever developed and in use at companies like Google, Facebook, and Amazon, but used far too little by the rest of the corporate, civic, and non-profit world.
We provide access to hundreds of thousands of open data sets, and enterprises around the world rely on data.world as a unified resource for managing their substantial volumes of metadata. We are now seen as a mission-critical software for governing the modern data stack, and helping businesses make smarter decisions based on data.
But we all know data.world could be so much more. That’s why today, I’m very excited to share two pieces of news:
We announced the evolution of our data catalog, data governance, and DataOps applications as the Data Catalog Platform that includes a framework for embeddable AI bots that dramatically increase automation.
Our new “Archie Bots” remove the SQL barrier and make it easier for all users – not just data experts – to enrich, discover, and better understand their data.
I wrote about Generative AI just last month, right before attending the TED conference, and couldn’t be more excited to introduce these powerful capabilities to our customers via the Data Catalog Platform. Archie Bots integrate the power and flexibility of our knowledge graph-architecture with LLMs, including, but not limited to, OpenAI’s GPT. What’s really cool is that these capabilities were developed through data.world’s AI Lab – under the leadership of principal scientist, Juan Sequeda – and in partnership with customer design partners, like WPP, who tested early integrations. Our product and engineering teams realized, like me, that we were in the most important shift in the history of technology in decades and we raced ahead to benefit all.
Thinking back to those early data.world conversations, this is a huge unlock in data exploration for everyone.
People still struggle to use data
Business leaders widely recognize that data is the key to driving business decisions. I’ve written about this quite a bit. But the truth is that most business users struggle to get value from data. That’s because the vast majority lack the skills and confidence to get the answer they’re looking for from a technical set or collection of data.
Only 11% say they are confident in their data skills. And when those 89% are faced with an overwhelming amount of data or hard-to-understand technical language, they often choose to ignore the data altogether and resort to gut decision making.
On a fundamental level, many people simply aren’t yet comfortable working with data, and SQL particularly. They don’t know where to start, what questions to ask, how to answer those questions, or how to determine if they are using the right data in the first place. Ultimately, this discomfort with data leads to low adoption and data abandonment.
Archie Bots fundamentally change how we interact with data
The name “Archie” was inspired by Archimedes, noted mathematician and engineer, and Merlin’s pet owl in the animated film The Sword in the Stone. We believe Archie Bots will change the way people interact with data.world. Here’s just a brief list of some of the things our customers can do with Archie:
Discover data using AI-assisted search: Archie Bots enable users to quickly find data and refine their searches with a chat-like experience. Data consumers spend less time culling through search results trying to find something that is useful and more time understanding answers.
Auto-enrich data assets for greater productivity and understanding: Writing definitions and descriptions for data and metadata assets takes painstaking, manual effort and slows down data catalog implementations. Archie Bots automatically generate natural language descriptions for tables, columns, glossary entries, and definitions for metadata resources, like views, SQL queries, dbt models, and Snowflake access policies. Archie Bots reduce the manual human effort required to enrich data assets, greatly improving productivity and understanding.
Guided ideation for deeper exploration: Often, data consumers are unsure of where to start. What can I do with this data? How can this data help me move the business forward? Archie Bots suggest research questions and analytic hypotheses, helping data experts ideate faster and non-data experts generate business value with data.
Generate SQL with natural language to tap into deeper knowledge: Archie Bots make it possible to navigate to deeper organizational knowledge that would otherwise be accessible only to data experts who can code in SQL. Automatically convert a natural language question into a well structured SQL query with a plain English description to aid in human understanding.
See the new Data Catalog Platform and Archie Bots in action
Register now for our live digital event on May 23.
Also, just yesterday, we spoke about our new BB Bots in this webinar, thanks to our acquisition of Mighty Canary’s technology. The way our new BB Bots and Hoots will help data engineers and DataOps overall, increasing the level of trust throughout the modern data stack, is just one of many automations you’ll see from us. Automation is a huge focus for us this year overall - it’s all about getting the leverage of AI and Bots for our customers and partners to modernize a space sorely in need of it with way too many tools and way too many disconnected data silos.