We are excited to announce a research partnership with University of Edinburgh Professors Leonid Libkin and Paolo Guagliardo from the School of Informatics. The goal of this research is to develop well founded approaches to deal with the users intended semantics for null values in a database. Ultimately we want to generate approaches and practical tools that can increase the information one can extract from a database. And we need your help to do it.
The handling of null values in relational database systems is often criticized. When it comes to handling nulls, SQL’s behavior has been described as “fundamentally at odds with the way the world behaves” (Date, Darwen, ‘A Guide to the SQL Standard’) or capable of “ruining everything” (Celko, ‘SQL for Smarties’), and recommendations to avoid nulls altogether are not uncommon (The Third Manifesto, Hugh Darwen and C.J. Date, March 1995). Yet nulls are a fact of life and need to be dealt with.
In his third rule, Edgar Codd stated that null values should represent both “missing” and “inapplicable” information.
Systematic treatment of null values
Rule 3: Null values (distinct from the empty character string or a string of blank characters and distinct from zero or any other number) are supported in fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type.
How does a system know whether a null value is missing or inapplicable? Consider the following examples.
Examples of NULL problems
A person whose date of birth is NULL: given our understanding of what a birth date is, we should interpret that NULL as a placeholder for something that exists but we do not know its value. On the other hand, things are different in the case of a person’s date of death being NULL, as that may signal that we don’t know on which date that the person died, ... or that the person is still alive!
As another example, suppose the CEO’s salary is NULL. This could be due to the fact that the CEO doesn’t take a regular salary (NULL should mean 0), or perhaps the CEO salary is not disclosed. Asking for the average salary in the company would return the same result regardless of the interpretation, those the actual average in those two cases could be very far apart.
How do we know what is the meaning behind a specific null value? It cannot be done systematically and it yields too many quid pro quos. The system may consider a null as inapplicable while it was representing some missing information and vice-versa.
Understand data consumers
When systems impose a meaning to nulls, they can diminish the value of the dataset. If the meaning does not match the users expectations, the information extracted becomes suspicious: it can be incomplete or inaccurate.
We need new approaches to ensure that systems can capture the true meaning of null values appearing in databases. We need to improve DBMS to allow accurate and efficient information extraction upon other semantics of null.
How you can help
In order to develop approaches and tools that manage several meanings of null values, we need to first understand what those meanings are. Therefore we are collaborating with the University of Edinburgh research team on a survey.
We believe that you, our users, have valuable insight to share. We want to understand what you think about SQL handling of null values. Is it what you expect? Is it problematic? We want to understand the different meanings of null values which appear in databases that you work with. We want to better understand your expectations and wishes for query evaluation upon databases with nulls.
Please consider participating in this 15-minute online survey. Your expertise and input will be a valuable contribution to understanding of null values in databases.
If you have any trouble opening the survey, please copy and paste the URL below into a new browser or tab (even if the survey is mobile compatible we recommend to use a computer for a better experience):
https://edinburghinformatics.eu.qualtrics.com/jfe/form/SV_2sGboKoH7vqL6yp
If you would like more information or have ideas to share, we’d love to hear from you. Please contact Professor Leonid Libkin (libkin@ed.ac.uk) and Juan Sequeda (juan@data.world) with any questions or comments.
(Note: As a Public Benefit Corporation, data.world’s mission is to build the most meaningful, collaborative and abundant data resource in the world in order to maximize data’s societal problem-solving utility. On the course of this mission we encounter challenging and exciting engineering problems that our development team tackles every day. We are also faced with problems that do not have a clear and immediate path to a solution: open scientific research problems. We are excited to partner with leading researchers around the world to share problems and develop novel solutions.)