To read this content please select one of the options below:

Data quality assurance in research data repositories: a theory-guided exploration and model

Besiki Stvilia (School of Information, Florida State University, Tallahassee, Florida, USA)
Dong Joon Lee (Mays Business School, Texas A&M University, College Station, Texas, USA)

Journal of Documentation

ISSN: 0022-0418

Article publication date: 25 January 2024

Issue publication date: 26 June 2024

360

Abstract

Purpose

This study addresses the need for a theory-guided, rich, descriptive account of research data repositories' (RDRs) understanding of data quality and the structures of their data quality assurance (DQA) activities. Its findings can help develop operational DQA models and best practice guides and identify opportunities for innovation in the DQA activities.

Design/methodology/approach

The study analyzed 122 data repositories' applications for the Core Trustworthy Data Repositories, interview transcripts of 32 curators and repository managers and data curation-related webpages of their repository websites. The combined dataset represented 146 unique RDRs. The study was guided by a theoretical framework comprising activity theory and an information quality evaluation framework.

Findings

The study provided a theory-based examination of the DQA practices of RDRs summarized as a conceptual model. The authors identified three DQA activities: evaluation, intervention and communication and their structures, including activity motivations, roles played and mediating tools and rules and standards. When defining data quality, study participants went beyond the traditional definition of data quality and referenced seven facets of ethical and effective information systems in addition to data quality. Furthermore, the participants and RDRs referenced 13 dimensions in their DQA models. The study revealed that DQA activities were prioritized by data value, level of quality, available expertise, cost and funding incentives.

Practical implications

The study's findings can inform the design and construction of digital research data curation infrastructure components on university campuses that aim to provide access not just to big data but trustworthy data. Communities of practice focused on repositories and archives could consider adding FAIR operationalizations, extensions and metrics focused on data quality. The availability of such metrics and associated measurements can help reusers determine whether they can trust and reuse a particular dataset. The findings of this study can help to develop such data quality assessment metrics and intervention strategies in a sound and systematic way.

Originality/value

To the best of the authors' knowledge, this paper is the first data quality theory guided examination of DQA practices in RDRs.

Keywords

Acknowledgements

We would like to express our gratitude to the participants of our study. We extend our appreciation to Leila Gibradze for her invaluable assistance with the analysis of data. This research is supported by a National Leadership Grant from the Institute of Museum and Library Services (IMLS) of the U.S. Government (grant number LG-252346-OLS-22). This article reflects the findings and conclusions of the authors and does not necessarily reflect the views of IMLS.

Citation

Stvilia, B. and Lee, D.J. (2024), "Data quality assurance in research data repositories: a theory-guided exploration and model", Journal of Documentation, Vol. 80 No. 4, pp. 793-812. https://doi.org/10.1108/JD-09-2023-0177

Publisher

:

Emerald Publishing Limited

Copyright © 2024, Emerald Publishing Limited

Related articles