SPARC’s OpenCon 2017:The Four Pain Points I found for Making Research Data Open

by | Feb 5, 2018 | Data community

What are the barriers preventing you from making data open in your institution? At the 2017 OpenCon Conference I met an energized and inclusive community from all over the world overcoming a variety of barriers, while pushing their respective open access missions. I learned first-hand about a few of these challenges while leading an unconference on open data licensing. A diverse group of folks attended, including grad students, post-docs, university librarians, policy advocates, a researcher, a data scientist and analyst.

opened the unconference, titled Lack of Open Data Licensing in Open Data Efforts, with the question: “What does open data mean?” The simple question led us down a rabbit hole of discussions around the various data issues in academia. Though the perspectives were diverse, there were a few shared challenges: how to encourage institutions to move from ostensible to earnest open data enforcement, how to make funders or universities default to an open data policy, and how to integrate better open data policies into grant applications.

So what’s preventing the application and enforcement of a clear open data policy? After reviewing the notes from the unconference, there were four main pain points that stood out:

  1. Cut Out Vague Language

What does “sharing data” mean? What does “open data” mean? Though I was in a room with a dozen smart people from similar fields, the terms “sharing” and “open” were more subjective than I realized. Therefore, the practice of attaching such nebulous terms results in very little guidance for the end user and can invalidate the intentions of the data provider. This leads me to my next point.

2. Set Clear Instructions

How is data used or not used? Using an established license is the most straight-forward way to provide guidance on the freedoms or limitations one has when using the data. What came up a few times in the conversation, was the now prevalent requirement by many funders for there to be an open data policy attached to a grant proposal. (Check out the American Heart Society’s to see an example.) Yet, often the policies are vague, and do not require applicants to provide formal licensing or terms in their open data policies. This leads me to my next point!

3. Enforce Rules and or Incentivize Compliance

Why should researchers and PIs provide an open data license? Often researchers are spread very thin as it is, and do not have the resources or training to know how to create a sustainable open data policy or how to share their data properly. In addition, the incentives for going the extra mile to publish data or research licensing terms are not necessarily there. For example, a lot of researchers still don’t know that statistics show open data result in more citations. In addition, if grant providers were more stringent about their open data policy terms adoption inevitably improve. Generally speaking, the requirements for open data policies for researchers applying for grants are relatively lenient.

4. Provide infrastructure for sharing, especially for sensitive data
(aka Opening data is scary… support and hand-holding appreciated)

Why would I put my sensitive data in jeopardy? The term “open data” sounds like a wonderful idea and for most data it is, but there is the issue of competitive and Personally identifiable information (PII). Just because data is sensitive does not mean it is not “shareable,” for example with individuals that have signed an agreement or gone through an Institutional Review Board (IRB) training (a requirement for a lot of researchers). It’s easier to set limits with a proper license and infrastructure. At the moment, there are several academic institutions working on how to set up a proper repository to accommodate data of varying sizes and different access levels. Yet while the institutions are in limbo or completely lacking such an infrastructure, their researchers are left to publish their data (often only metadata) in disparate places. The data is in danger of being lost or forgotten.

As data.world continues to share more data, we are finding increasing examples of amazing data that can’t be shared because of licensing. Or the data is available for download and seems to have an open license, but data users must agree to terms which makes sharing/utilizing the data beyond the original repository difficult. Of course, there is also the case of orphaned data that no longer has a custodian to assign a license and is left unusable. Because academia is so transient, the requirements for how data can be used are sometimes lost.


The shift towards sustainable open data policies is slow but evident. We look forward to seeing more institutions improving the mechanisms to make their data open and also look forward to helping with our own platform and tools. Please visit the Licensing topic in data.world’s Forum where you can discuss your personal experiences and challenges with making your data open.

I’d like to thank all the participants that attended the session. Anonymized notes from the unconference are here for anyone to peruse: Absence of Open Data Licensing in Open Data Efforts. I’d also like to thank the folks at SPARC that ran one of the most organized and inspiring conferences I’ve ever attended.

If you’d like to check out licenses we love, see this blog on licensing we recommend for data.


Licensing for open data is challenging enough, so how do companies scale data distrbution to their partners for business use? Learn how they use data.world to get data and insights where they need to be here.