Recommendations for Supporting the Long Tail of Research Data
Summary:
Major societal challenges such as health, climate change, energy, food availability, migration and peace depend on the contributions of a distributed and diverse international network of researchers and subject experts. The aim of open science is to improve the accessibility of research outputs, including articles, data and other research objects, so that researchers, industry and the public can make use of, build on, and ensure the validity of these research outputs.
Among research outputs, research data are often the most diverse – as diverse as the international network of experts that perform research. Datasets may be small or large, simple or complex, structured or unstructured. Data may stem from hundreds of different subjects, may be produced by numerous methodologies, and exist in a plethora of different formats. The diversity of data is also characterized by a variety of data management practices, of varying quality and comprehensiveness. Historically, large structured datasets in well-established disciplines are more likely to adopt unified and standardized formats that are disciplinarily defined and accepted. Similarly well established disciplines tend to have common and understood workflows, where as in the long tail of research it is not unusual for researchers to use a variety of tools and to develop ad-hoc data workflows. Long tail datasets, on the other hand, which vary radically in source, discipline, size, subject, provenance, funding, format, longevity, location and complexity, are less likely to adhere to common standards. […]
…
The Research Data Alliance (RDA) “Long Tail of Research Data Interest Group” has been assessing the situation of long tail data over the last three years, and urges the broader community to consider the risks and opportunities related to long-tail data. This document provides seven recommendations for a variety of stakeholders, including governments, funders, research institutions and researchers to help improve the current approach to managing long tail data. We call on the community to work together to create necessary and sufficient conditions to ensure we are able to properly steward these valuable research outputs for future generations of researchers.