Dawn in the Anthropocene — The Evolution of Data 

We are currently witnessing the earliest days of humanity’s next chapter.

Our society is evolving into a data-driven civilization, its contours just now emerging. 

Understanding the significance of this point in history is both illuminating and revealing, and it’s been on my mind quite a bit as we head into 2022.

What follows is a four-part exploration into the evolution of data, which — much like human evolution — begins with a “Cambrian explosion” and continues through the Anthropocene, the age of human dominance.  

We are now embarking on the “last mile” of data’s evolution. Our new digital ecosystem is fast evolving in complexity and diversity, but it remains primitive. It is characterized by hoarding, isolation, and fragmentation.

But not for long. Understanding — true “cognition” — is just around the corner.


The Nervous Systems and Brains of Data Are Here

The first known commercial data, records of animal sales and grain harvests, were recorded as cuneiform by the Sumerians in ancient Mesopotamia on clay tablets around 3000 BCE. Data became a true utility in the late 15th century with the invention in Florence of double-entry bookkeeping, which provided clarity that allowed commerce to flourish and ushered in the Renaissance.

The Battle of Waterloo in 1815 revealed an early example of what we today call “prescriptive data.” Word of Napoleon’s defeat was rushed back to London by secret carrier pigeon, allowing the House of Rothschild to use the exclusive, bird-borne data to make a killing on British government bonds.

Just 86 years ago, after the Social Security Act became law, Franklin D. Roosevelt’s administration created America’s first major data project to track the contribution of nearly three-million employers and 26 million Americans. The massive bookkeeping project to track and store the data with punch card reading machines was awarded to a young company called the International Business Machines Corporation, or “IBM.”

Fast-forward through some more familiar milestones: ARPANET in 1958; the COBOL data language two years later; the launch of Structured Query Language, or SQL, in 1979; the World Wide Web a decade later; the emergence of cloud computing in 2006…

And now, according to author Kevin Kelly, founding editor of Wired, data is the cellular structure of an emerging “planetary superorganism.


Data’s Cambrian Explosion

This blog post will be just one of seven million posted today. Also today, Google will log 5.6 billion searches. By the end of this year, more than 46-billion devices will be connected to the internet of things, and we’re only in the early days of the soon-to-be-ubiquitous 5G cellular networks that will boost data rates by as much as 50 times. Meanwhile, more than 50 percent of corporate data is now stored in the cloud — really an “inter-cloud,” with a capacity of 470 exabytes linking the server farms of Amazon, Microsoft, and others. 

How much is an exabyte? It’s a sum that defies easy comprehension. There were five exabytes of information created between the dawn of civilization through 2003, noted former-Google CEO Eric Schmidt, “…but that much information is now created every two days, and the pace is increasing.”

Another way to think about this is that we’re at the data version of a Cambrian explosion, that extended moment roughly 540 million years ago when complex, multicellular organisms first began to appear. 

Life itself is actually much older, by at least three-billion years. But before this rapid period of evolution, most organisms were relatively simple, composed of individual cells, or tiny multicellular organisms. And all of them lacked a nervous system.

My point here is that if we can date the origin of data to the Sumerians more than 5,000 years ago, we can equate the past two decades of our data evolution to the Cambrian explosion. And the more data we produce and capture, the more data we create in the form of metadata, or data about data. 


The Future of Data Interconnectivity — Collaborative Cognition

Our unicellular, protozoan databases have evolved into a kind of multicellular, metazoan means of data storage. But this ecosystem, though fast evolving in complexity and diversity, is still primitive and disconnected. It is characterized by hoarding, isolation, and fragmentation.

Our much vaunted realm of data, the “metaverse” as it were, still lacks a nervous system. And equally important, it lacks a brain.

But at data.world, we’re working on that. The nervous system and brain of data does in fact exist today. We call the nervous system the “data catalog” and the brain the “knowledge graph.” So I’ll twist that famous aphorism about the future from one of my favorite science fiction authors, William Gibson:

The nervous systems and brains of data are here. They are just not evenly distributed.

I’ll explain just how we solve that problem in my next blog on corporate and collaborative cognition.