“Why I Chose to Join data.world” — Dean Allemang, Principal Solutions Architect

by | Jan 4, 2022 | 2021, data architecture, Data community, data ethics, Data-driven cultures, POV

For the past decade or so, I’ve been fortunate enough to make a successful career helping large organizations manage their data at scale.

During that time, I’ve come to believe that data plays a role beyond good enterprise management; indeed, it holds the key to the survival of the human race.

I believe that good data management on a global scale — e.g., via the FAIR Data Principles — is a prerequisite for unlocking that potential. And while there are a lot of data platforms out there, none of them has the potential to support a vision of a global FAIR data movement as well as data.world.

That’s why I chose to join the company. My personal goal is to make data.world the best FAIR platform in the world, and to make our company the leader in the transformation to a global web of FAIR data. 

But how did I get here? And what exactly persuaded me that data.world is best positioned to lead the charge to the FAIR data promised land?

 

How Do You Go from a Degree in Pure Mathematics to Being a Consultant in the Semantic Web?

I had the good fortune to spend a week in November 2021 visiting one of my alma maters, the University of Cambridge. I made contact with Mark Perkins, a fellow Cambridge alum, who asked me a simple question: “How did you go from doing a degree in Pure Mathematics to being a consultant in the Semantic Web?”

This is an interesting question, and it touches on a lot of key points in my career. After leaving Cambridge in 1984, and realizing that I was not the sort of world-class mathematician that would solve a 400 year-old problem — like some of my colleagues — I moved on to Artificial Intelligence (AI). It is important to understand that in those days the prevailing goal of AI research was to create so-called “Expert Systems.”

The basic idea — in many labs, and certainly in the one I ended up working in — was that there was such a thing called “knowledge,” and that certain people had it, making them experts in their field. If we could just formulate that knowledge in such a way that a computer could process it, then a computer could perform at expert levels. Then we could replace experts with computers.

Not surprisingly, to modern minds anyway, even the occasional system that achieved technical success in these matters failed to catch the interest of the general public. It seemed that nobody really wanted to pay a visit to a computer program when they could instead see a human physician.  

Dean Allemang - Data holds the key to the survival of the human race

So we retreated to expert decision support systems; could we represent knowledge in such a way that a trained physician could have access to far more information, at their fingertips, than they could possibly know or keep current with? Such a system could improve the quality of human performance in important areas. Imagine if a physician could have all medical knowledge at their fingertips, even during a consultation! What a powerful idea this could be.

Around 1995, a few things happened to make me become disillusioned with AI. We had entered what was to be known as the “AI Winter” when interest dried up. Funding and new research also became scant, while leaders in the field published embarrassing articles. But most importantly, a new way of looking at information became popular: the World Wide Web.

I don’t have to describe the impact the Web has had on society, science, politics, and indeed every aspect of our lives. But for this story, the important thing to notice is that this new technology achieved just about every goal that we had set out for expert decision making systems; it became commonplace for physicians to come into an examination room with a computer tablet, and to look up symptoms, signs, conditions, diseases, and treatments.

Just as we had dreamed, our physician had access to all of medical knowledge at their fingertips. 

Even such august institutions as the Mayo Clinic jumped on the bandwagon, producing websites to share the hard-won learnings of generations of Mayo practitioners with the web at large. And with this, something else happened, something that was beyond our wildest dreams when we were building expert systems; anyone with access to a computer could gain expertise in any area — just look up the information, and go to town. Not only had we improved the behavior of expert practitioners, but we had also made expertise available to the public.

Sharing Knowledge via the World Wide Web

It seemed to me that AI deserved its winter; so many of the things we wanted to achieve simply fell out of the transformation to the post-Web world. If what you really cared about was improving how people performed by providing them with knowledge, the Web was the way to go, not AI.

I wondered if there was anything to salvage from my years of studying AI and Knowledge Representation that would have any relevance in the new, post-Web world. At first, it seemed that the answer was “no.” If you had something to sell, you could basically render your spreadsheet as web pages, sell online, and make a fortune. The expanded markets that could be reached by the web exploded in a cottage industry of miniature online storefronts. There was no need for knowledge or expert behavior or anything; if you built it, they would come… and buy.

But soon it became apparent that in order to buy something, you needed to find it, and the data that described products was important. That data was distributed — to bring it together, you needed to know what the data meant. And so was born the “Semantic Web” — the web that describes meaning. 

I discovered to my delight that a lot of the methods for describing knowledge that we had developed back in “old AI” were relevant here. Many of these ideas have turned out to be very successful; graph representations of data (we used to call them “Semantic Nets”); descriptions of relationships between kinds of things (“Concept hierarchies”); but most of all, the idea (pioneered in the ’50s by LISP) that pointing to your data is more useful than copying it. I decided that I would study this new Semantic Web stuff, with an eye toward becoming an expert.

While the dreams we had back in the Expert Systems days had largely been fulfilled by the document web, I — along with Sir Tim Berners-Lee and others — felt that in the democratization of data, we could see humankind make advances beyond what we could imagine. I’m not talking about the data that would help someone sell better than just a spreadsheet on the web — though indeed, that drove a lot of advances; I mean data that would help the human condition. Data about molecular biology that help scientists discover cures for difficult diseases. Epidemiological data that help us track the effectiveness of treatments for a pandemic. Data about housing, crimes, demographics, and other social issues that would inform public policy. Data about the production, distribution, and consumption of food that allow us to address hunger on a worldwide scale.

I believe, quite sincerely, that data holds the key to the survival of the human race. And I believe that managing data on a global scale is key to making that data usable.

While saving the world is a laudable goal, it doesn’t always pay the bills. The key to developing the technology to work at a global scale is to get it to work reliably on a smaller scale; this applies to any technology, not just data sharing technology. So I and several other consultants and companies realized that the same data-sharing dynamic is at work inside large enterprises. 

FAIR Data – Findable, Accessible, Interoperable, and Reusable

This awareness has caught on in enterprise data management circles, under names like “Data Fabric” and “Data Mesh.” And this has allowed me to make a steady career for the past decade or so working with large organizations (usually banks), helping them to manage their data on a very large scale. I also worked with the EDM Council, a consortium devoted to improving data literacy and practice at an industrial scale, starting with the banks.

But I still have my eye on changing the world — building a network of data that will inform science, medicine, public policy, finance, sustainability, and ending world hunger. But something I learned from doing data management in very large enterprises is that a lot of the issues around data sharing don’t have to do with the data itself; they have to do with the services around the data, information about versioning, consistency, delivery, and assurance around the data. 

These ideas about data delivery have been summed up very nicely under the acronym FAIR Data Principles, describing how to make data Findable, Accessible, Interoperable, and Reusable.

The FAIR principles have taken the things we learned about effective data management in the enterprise and proposed applying them to data sharing in the world at large.

Dean Allemang - data.world the best FAIR data platform it can be

The FAIR data principles don’t prescribe any particular technology, just principles for how to treat your data. When I read them, I felt as though someone had taken all the experience I gained making data re-usable in the enterprise and wrote it out in a few terse lines. Bravo!

But I don’t just want to make a practice of FAIR-ifying data, though there are more and more companies who are doing just that — and good on them! I want to make a world where the benefits of FAIR data are so apparent, and the ability to FAIR-ify data becomes so widespread, that it will just be expected that all data will be FAIR.

This is why I joined data.world. Quite simply, while there are a lot of data platforms out there, some based on Semantic Web standards and some not, none of them has the potential to support a vision of a global FAIR data movement as well as data.world. Why do I say this? Here are some reasons:

  • Anyone can host their data on data.world for free. Yes, for free. You have some interesting data, and you want to share it with the world? Then you should know about data.world. Your account is free, and if you already have an account with Google, Facebook, Twitter, or Github, you don’t even need to create a new login. You have a spreadsheet, or a CSV, or some other common data format? Just drop it in. It really is that easy. Go ahead, try it. Click the data.world link, and upload a spreadsheet. 
  • The data on data.world is active right away; that is, you can write and run queries against your data immediately. So that spreadsheet you uploaded just now? Start querying it. It’s great that you can download free editions of a lot of databases, install them in your own infrastructure, load data into them and then start querying. With data.world, you’re already querying your data, and you don’t have to install anything. 
  • You can share your data and your insights. Those queries you wrote? Those are first-class entities; you can share them with your friends, and even post them as widgets on your website, as I have done with all the exercises in my book.
  • data.world lets you crowdsource information about your dataset. When you bring in a spreadsheet, you can add descriptions of the tables and columns in it. But not just you; your friends and collaborators can do the same. Spreadsheets are notorious for having cryptic column names; once you figure out what one means, you’ll want to share it.
  • All the data on data.world is linked. Yes, you can write a query that federates over any datasets in the entire platform. So if you find that someone else has a dataset relevant to your research, you can just link it in.

All of these features and more go a long way to making your data Findable, Accessible, Interoperable, and Reusable. And you don’t have to pay a penny to get started.

But wait a minute — if anyone can put their data on data.world, and share it with the world, and we can all do this for free, forever, why did I need to join data.world? I could be a champion for FAIR data on my own in the world, which is exactly what I have been doing for the past few years, in my consulting and writing practices. 

My agenda is to help make data.world the best FAIR platform around. And for the platform lead the transformation to a global web of FAIR data.

Learn how all your employees can leverage your data to make better business decisions in our report, “How to build a data-driven culture through Collective Data Empowerment.”