On July 11, 2019, we celebrated our third anniversary of data.world’s launch.  We gathered at the Alamo Drafthouse, followed by bowling and axe-throwing (yes, you read that right—it’s a thing now) at High 5.  It was tremendous fun and I truly feel so proud to be a part of our amazing team.

In last year’s anniversary letter, I referenced the release of our first Public Benefit Corporation Report as a Certified B Corporation®.  At that time we decided to make our Report public to all, instead of just our shareholders, and it highlighted how we’ve been fulfilling our public mission and proudly benefiting the world as a result.  It has been a very busy past year, and so I collaborated with our team to highlight all of the amazing things we’ve been up to since then to continue to fulfill our public mission, which is: 

…to (a) strive to build the most meaningful, collaborative and abundant data resource in the world in order to maximize data’s societal problem-solving utility, (b) advocate publicly for improving the adoption, usability, and proliferation of open data and linked data, and (c) serve as an accessible historical repository of the world’s data.

Data Literacy

While “data” seems to be a hot button for just about everyone these days, there is still a tremendous gap in the knowledge level of many.  This includes not just practitioners, but many who have much they could add to the data discussion in an organization’s quest to become more efficient and effective.  We’ve been working hard to help lead the data.world community, as well as provide commercial services, to educate the market and raise the level of discourse around data.  From the moment we helped co-author the Manifesto for Data Practices (check out the origin story here), we realized how much the industry needed this assist to really evolve.  Together with the help of the data.world community, representing some of the world’s best data scientists and analysts, we have delivered this content in three main categories:

1. Self-Serve Content

Between datapractices.org, videos, and tutorials, there is a host of freely available content/courseware that people can consume at their own pace.  Much of this work is also able to be integrated and used by anyone who would like to augment their own training materials, even if it’s as a commercial offering.  We believe that better content that becomes a consistent story will only help the industry and make more opportunity for everyone.

2. Guided Training

For those who would like a little bit more of a guided experience, data.world is partnering with other commercial companies and the community to be able to deliver this content in a customized and instructor-led format.  This is great for teams and corporate clients who need a much more bespoke solution to education and data literacy.

3. Peer Discourse

It’s lonely at the top!  One of the most difficult trends in research around the data ecosystem is that 50-60% of data leaders are going to fail in the next three to five years.  Many times it’s because there aren’t any resources for those roles, and no one for them to discuss the pressing issues of their position, especially at a strategic level.  data.world is partnering with Board.org to help found “The Data Board” which is a vendor-free, confidential discussion for people leading data efforts at big brands.

Data Journalism

The problem of trust in journalism is one that continues to challenge our friends in the media industry.  One of the best defenses in this area is providing data alongside news stories. To that end we are working hard to support journalists with a public platform to host and share their data, as well as facilitate discussions if they so choose.  Whether this is in public, or for a private audience, our hope is that data can help improve the quality and accuracy of our media. Whether this is on a national/global scale like the Associated Press, or the efforts that we support locally for an outlet like the Texas Tribune, we’re proud to provide services to any data journalist who needs them.  I encourage you to read the case study on our work with the Associated Press if you want more details and inspiration here.

Data Distribution

One of the key tenets of our Public Benefit Charter is around the proliferation of open data.  In support of this we’re constantly working with individuals and groups to help their data reach as wide an audience as possible.  In the past year, the number of datasets on data.world has more than tripled and the size of our global community of users accessing them has more than doubled. Whether this is scientific data, FOIA request data, public sector data, or data that is being released as corporate philanthropy, data.world is happy to help our community share data for greater impact.  Some recent examples of this include: 

IRS 990 data

The Nonprofit Open Data Collective received a grant to digitize and distribute information from the 990 form (financial information about nonprofit organizations).  This was a massive undertaking, often having to work from scanned images of physical documents. The results of this labor are now available on data.world in machine-readable format.

DOE Simulation Data

We’re working with a group from the Department of Energy to help set up a pipeline for their interactive simulation data from their supercomputing environment. This will stream data to the platform and expand the footprint of those who can build upon that research.

SXSW data

For SXSW 2019, we were given special permission to scrape and publish the schedule data.  Ready-made templates using parameterized queries allowed community members to query by any keyword, venue, genre, or topic across over 4,000 speaker sessions and 1,700 music sessions!  It is the only known public dataset of SXSW sessions and we took care to make it easy to use with no SQL or analytics experience required.

Linked Data

data.world has long held the belief that just as the World Wide Web connects documents, which contain information rendered in human-readable natural languages, the web of the future will connect data.  In support of this, data.world continues to strengthen and iterate on the graph model view of the world.  Whether this is through the underlying technology of the data.world platform, acquisition and support of projects like Capsenta (yes, we made our first acquisition in the past year and ZDNet has a great write-up here) and Gra.fo, or participation in industry events and conferences like the International Semantic Web Conference (ISWC).  I recently wrote this article in CIO on why Knowledge Graphs can lead to so much data-driven culture change.  The future of data is bright, and by linking data we can help it shine even brighter.

Case Studies

In “Building a data-driven culture for a sustainable tomorrow,” I interviewed Brett Jenks, the CEO of Rare.  Rare is a global conservation NGO on the forefront of using technology and data to protect the most vulnerable ecosystems around the world.  Here’s an excerpt, and I’m quoting Brett Jenks below: 

“Our goal in countries like the Philippines, Indonesia, Honduras, Brazil, and Mozambique is to help coastal communities restore their fish populations, boost their incomes, protect the coral reefs and mangroves on which their economies depend. These are the outcome data and we are literally measuring fish populations underwater, changes in household economics, and the total area under new protection as a result of our work. But to manage the growth of our impact, we need to collect output data as well. So we are tracking funding raised, the volume of fish sold, the number of fishermen registered to fish legally, the number of coastal mayors who have signed on to adopt the Fish Forever model.

 

What’s really cool is that we can see in real-time how these millions of dollars are moving through remote coastal fishing communities, which were never before considered in national economic models. We can track the signing by local mayors of a global commitment to restore coastal fisheries as each signs up in countries around the globe. (…)

 

I loved the day our fisheries team rolled out a new dashboard to the Executive Team. They said, ‘Okay, click here and you can start tracking progress across all our major measures in real-time.’ That was something I had dreamed about a decade ago. The desire was there, but we were missing a few key tech-savvy scientists and a platform that makes data sharing so compelling.

 

Today, we have a system established that not only allows us to house our often-disparate datasets under one roof, but also helps us discuss and share program insights, and, crucially, it provides simple, powerful ways of linking and summarizing datasets. The ability to quickly make sense of data from a wide range of sources and forgo the time and effort of managing numerous file formats is critical for accurate interpretation and efficient data-driven decision-making. The ease of integrating data with a number of other platforms, whilst maintaining a live connection to data.world, provides us with a powerful method of dissemination. Data is available in real-time in a comprehensible and accessible format to those who need it, when they need it.”

Social Responsibility

One of the very first communities that was active on data.world was Data for Democracy.  While individual members continue to have a strong presence on our platform, we have worked hard to foster other similar organizations in the Data for Good or Data for Social Responsibility efforts.  We feel that data is a great equalizer, whether you are a giant corporation or a single user working to make the world a better place, and want to ensure that our tools are available to all.  We continue to work closely with Data Ethics efforts as well as being a prominent sponsor and data repository for events like Code for America’s Civic Camp. 

One of our newest communities is DXC’s Open Badge Academy.  DXC is using data.world as the backbone of one of their AI certification badges.  DXC exists as an organization on data.world and the majority of the students operate under free public licenses.

In addition to the support we’re providing to coastal fisheries, we’re also working with Rare in another way through the US-based branch of Rare called Make It Personal.  We’re creating an app that calculates an individual’s carbon offset based on credit card statements and provides a simplified, no-brainer option to purchase carbon offsets.  The team includes some amazing organization and brands, including Rare, data.world, Visa, Yale, and Aceable. This kicked off at SXSW, where I was proud to secure a speaking slot for Brett Jenks at Capital Factory and a meeting with our Mayor, Steve Adler.

We love to watch people change the world through data, and want to make sure those who are performing the change have as much support as possible.

Education and Research

In order to fulfill our Public Benefit Corporation mission to “Advocate publicly for improving the adoption, usability, and proliferation of open data and linked data,” we are committed to participate in educational outreach and foster research.  Juan Sequeda, data.world’s Principal Scientist, who wears both a scientific and business hat, is uniquely positioned to lead the charge towards this vision. Juan joined us through the acquisition of Capsenta.  Their technologies Ultrawrap and gra.fo were a huge key to our companies coming together.  He and his team were attracted to our mission, and we were attracted to them because they had already been living key aspects of it on their own.

From an educational standpoint, Juan participated as an invited speaker at the 1st Summer School on Knowledge Graphs and Semantic Web in Cuba lecturing on building and designing Knowledge Graphs from databases.  Additionally, at the Linked Data Benchmark Council Technical User Community meeting, co-located at SIGMOD, Juan presented the progress of the Property Graph Schema Working Group.  Juan is the chair of this working group, which will provide recommendations to the upcoming Graph Query Language (GQL) standard

There are several upcoming events where my co-founder and our CTO, Bryon Jacob, and Juan will be presenting on developing a hybrid data cloud, including Graphorum and ISWC.  At the International Semantic Web Conference (ISWC), which is the premier conference on Semantic Web and Knowledge Graphs, Juan will be the keynote at the Ontology Matching Workshop and present a tutorial about the history of Knowledge Graphs.  And there are several other industry events where Juan has been invited to educate about Knowledge Graphs. 

From a research perspective, we started to partner with top academic research groups in the world who investigate Semantic Web and Knowledge Graphs.  Our goal is to share real world problems where engineering may not be sufficient and research may be required. You can never have enough smart people in the room! 

We are committed to disseminate our research results.  At ISWC, we will be presenting a paper that describes a methodology to design and build Knowledge Graphs that was developed and tested over the past few years at Capsenta.  This methodology is being successfully applied with our customers.

Finally, Juan will be representing data.world at events with renowned scientists and thought-leaders such as the STI2 Semantic Summit, the Emerging Challenges in Data Management and AI Research Seminar, and the Big Graph Processing Systems Dagstuhl Seminar.  The goal of these events is to discuss open challenges and provide a roadmap that can guide the future development of graph databases and Knowledge Graphs.

Commercialization of data.world

Over the past year, we’ve made huge strides commercializing and differentiating our data catalog.  Our newest product overview is the best reflection of that journey.  It is pretty staggering to think that employees of 327 of the Fortune 500 are in the data.world community, using us as a public data catalog.  We’ve called ourselves the world’s largest collaborative data community, and that also makes us the world’s largest collaborative data catalog.  This has helped us commercialize much faster.  For example, when we test new tools and features, we reach statistical significance quickly.  We continue to add new case studies, including this recent one with Aceable, and guidance the market is hungry for, such as How to Plan and Launch Your Modern Data Catalog.  We are proud to be the cloud-native data catalog and virtualization platform powered by a Knowledge Graph.  We know this is a game-changer for the industry because our customers see the results. 

There is no doubt that the next year will be a very exciting one for us, and if you’ve made it this far into my annual letter please allow me to sincerely thank you for all of your support.  There is no data.world community and catalog without you, and your feedback is always very welcome. And there is no data.world without our amazing team, truly the best startup team I’ve been a part of.  Check out our culture and valueswe just won our 4th annual Best Places to Work Award!

I’ll end with a photo of our team from our anniversary celebration at High 5!

Sincerely,

Brett

The data.world team celebrates after All Hands