“It is only by combining and comparing the various parts of the whole with one another, and noting the resemblances and their differences, that we shall arrive at a comprehensive view…”
– Polybius, The Histories, circa 160 BCE
data.world’s massive mission
Former Google CEO Eric Schmidt said in 2010 that “all the data created in human history is now generated in two days.” And we’ve come a long way since 2010; we now produce that volume in just a few hours. The cloud – where most of that data is stored – is a $730 billion industry employing almost four million people.
How do we organize it all? More than 90% of executives aspire to lead “data-driven” companies yet fewer than half say they are succeeding, according to the Harvard Business Review. Surveillance, security, privacy, and even identity are the focus of deepening anxiety and harsh debate over data. As we know in this year of elections at home and around the world, "A lie can travel halfway around the world before the truth has a chance to get its boots on.”
Enter data.world, formally founded and launched on July 11, 2016 (7-Eleven’s Slurpee Day!) to provide that “comprehensive view” of all data. All too often, we examine and understand data in isolated, siloed, and such limited fashion, making the most important resource of the 21st century a form of dead capital. Having your data assets scattered, disorganized, and incompatible isn’t even like having your money hidden under a mattress. It’s more like cash left in a basement, where the elements and the fauna nibble it away.
data.world’s origins
Let’s roll back the clock eight years, a blink of an eye in historical time, but an eternity in technology. When we unveiled our plans to tame the reactive, chaotic, and fragmented — but exponentially growing — realm of data, the challenges were fresher, this Cambrian Explosion of data less mature, and data governance was less evolved.
I was more than two years into semi-retirement after taking my previous company, Bazaarvoice, to an IPO and unicorn exit in 2013 (today, Bazaarvoice is much bigger and under the very able leadership of my good friend Keith Nealon.) The creation of Bazaarvoice followed that of Coremetrics, my first large company and a pioneer in the SaaS business and technology model. Founded in 1999 as I was wrapping up my MBA at the Wharton School, Coremetrics was an analytics window into the interacting elements of early ecommerce companies, including Walmart, The Home Depot, Expedia, and hundreds of others. It’s now called IBM Digital Analytics, for the iconic company that ultimately acquired it.
Pondering a return to the arena of technology entrepreneurship, I reached out to two former colleagues. One was Matt Laessig, the “team builder” (and American Ninja Warrior) with whom I was also classmates at the Wharton School. The other was Jon Loyens, the “architect philosopher”, who brought deep insight into the nature of knowledge. Soon we were joined by Bryon Jacob, the “technology savant.” The three were then working for Homeway.com (now Vrbo). I had not worked with Bryon before, but we know one another from Austin’s startup incubator, the Capital Factory, where we were both mentors, and Bryon had intersected with many of my past co-workers who had worked with him at Trilogy, including Jon. Jon and Matt were essential leaders at Bazaarvoice, so they were completely proven in my mind. Together, we formed the original executive team of data.world, and all four of us are still here today.
Integrating data from disparate sources was top of mind, but our ambitions were larger. Then as now, we oriented around Tim Berners-Lee’s vision of the internet as a universe of linked datasets beyond merely a massive collection of linked documents. We have, in fact, just this year brought that vision closer with our latest product, our AI Context Engine, to make structured data accessible to Large Language Models like OpenAI’s GPTs, Meta’s Llama, and others for the first time.
The genesis of our name, data.world
In 2016, we were out to found not just a company, but a philosophy, an architecture, and an ethos of responsible commerce. The name "data.world" was born from this vision, a nod to the early internet protocols like TELNET, Gopher, FTP, and others but with a crucial difference: we aimed to establish a new standard for open data collaboration. Why was data so siloed in a networked world, preventing us from being able to work together both within and across companies? Our mission was ambitious - to create the digital equivalent of a "GitHub for data", a global platform where the world could come together to work on data collectively.
This name encapsulated our three-pronged strategy: build a platform for worldwide use, bring that technology to enterprises, and create a marketplace for data. By choosing "data.world", we were declaring our intent to break down data silos, democratize access to information, and foster a community where data professionals and enthusiasts could collaborate freely. Our goal was, and remains, to build the most meaningful, collaborative, and abundant data resource in the world – a true global hub where data becomes a shared asset powering innovation, insights, and decision-making across the globe.
Right now we are most focused on doing this for enterprises, as our website screams out. Enterprises really need our help, especially in this age of AI. The most shocking statistic of 2024 came from Gartner’s Data & Analytics Summit: they reported that only 4% of data leaders are ready for AI from a data perspective. 96% are not! Gartner also reported that knowledge graphs would be the key solve for AI in the enterprise; our knowledge graph is the foundation of our suite of products, patents, and everything we do. It’s a key reason why we win in our industry and unlock so much value for our customers.
We stayed in stealth-mode for more than a year and we knew we had a lot of functionality to build. In fact, data.world was the most complex technical build of our careers. We wanted to make sure that when we launched, there would be enough functionality for the very broad, global community we were trying to attract.
Some of us brought a deep belief in the lessons from the open source software movement that argues that users should have control over the software they use and the ability to modify it to suit their needs. This earlier initiative led to the free software of Linux, as well as the culture of collaboration. The open-source GitHub became an inspiration for open, free, and collaborative datasets. Our open data community now has 2.8 million members, and projects we’ve helped on include our work with Johns Hopkins University on COVID tracking in 2019, and data-driven initiatives tackling challenges such as climate change, smart policing, and open government.
Others brought the original kernel of technical ideas to the brainstorm: the prototype versions of our foundational technologies, the data catalog and the knowledge graph. These have become, in their much evolved states, the nervous systems and brains of the enterprises we serve today.
Still others brought a formidable set of soft and hard skills to help us build the team that would make this all happen. I brought my experience from earlier companies and my drive as an entrepreneur to be the choreographer that every CEO must be.
Above all, we all brought the belief and commitment to organize ourselves as a proud and public benefit corporation – a B Corp in today’s parlance (and we are proudly Certified too) – whereby we codified in our bylaws our responsibility to employees, community, and the environment along with the equally important obligations to returns on shareholder investment.
After one of the first media interviews back in 2016, technology journalist John Battelle, the founding editor of the Industry Standard and co-founding editor of Wired, framed it this way shortly after our launch:
“... data.world sets out to solve a huge problem — one most of us haven’t considered very deeply. The world is awash in data, but nearly all of it is confined by policy, storage constraints, or lack of discoverability,” Battelle wrote of us in our infancy. “In short, data.world makes data discoverable, interoperable, and social. And that could mean an explosion of data-driven insights is at hand.”
Today’s data.world
Indeed, the mission we set out for ourselves endures today: “To build the world's most abundant collaborative and meaningful data asset."
With more than 2.8 million members of our community, thousands of users on our cloud-native, SaaS platform, 82 patents and counting, and a host of new tools like the AI Context Engine, we are well on our way.
As technology evolves, we evolve in tandem to create coherence out of chaos as the data and AI age is only beginning. We help enterprises deal with complex, high-dimension data, finding patterns, insights and solutions to make better decisions amid that vast and unpredictable explosion of data and the complexity of artificial intelligence and machine learning. We’ve also been named one of Austin’s best places to work eight years in a row. At the top of this article, I referenced the work of Polybius: a Greek historian who lived more than two millennia ago. That may seem an odd way to explain the founding of our company, but the wisdom of Polybius captures an essential of data.world.
Polybius was talking about a very different kind of complexity 2,100 years ago. But his insights on viewing and adapting to larger patterns in interconnected events are relevant today.
Today, we are contemporary and cutting edge, but our wisdom and search for truth amid torrents of information is as deep as that of the ancients.