Provenance Core Data Set
A Minimal Information Model for Data Provenance in Biomedical Research
Keywords:data provenance, data lineage, life sciences, Harmonizing RDM, Linking RDM
The exchange, dissemination, and reuse of biological specimens and data have become essentialfor life sciences research. This requires standards that enable cross-organizational documentation, traceability, and tracking of data and its corresponding metadata. Thus, data provenance, or the lineage of data, is an important aspect of data management in any information system integrating data from different sources . It provides crucial information about the origin, transformation, and accountability of data, which is essential for ensuring trustworthiness, transparency, and quality of healthcare data . For biological material and derived data, a novel ISO standard was recently introduced that specifies a general concept for a provenance information model for biological material and data and requirements for provenance data interoperability and serialization [3,4]. However, a specific standard for health data provenance is currently missing. In recent years, there has been a growing need for developing a minimal core data set for representing provenance information in health information systems. This paper presents a Provenance Core Data Set (PCDS), a generalized data model that aims to provide a set of attributes for describing data provenance in health information systems and beyond.
L. Moreau, and P. Missier, PROV-DM: The PROV Data Model, (2013). https://www.w3.org/TR/prov-dm/ (accessed April 24, 2023).
Parciak et al 2019: Provenance Solutions for Medical Research in Heterogeneous IT-Infrastructure: An Implementation Roadmap; Studies in Health Technology and Informatics Volume 264: MEDINFO 2019: Health and Wellbeing e-Networks for All; DOI 10.3233/SHTI190231 R. Wittner, P. Holub, C. Mascia, F. Frexia, H. Müller, M. Plass, C. Allocca, F. Betsou, T. Burdett, I. Cancio, A. Chapman, M. Chapman, M. Courtot, V. Curcin, J. Eder, M. Elliot, K. Exter, C. Goble, M. Golebiewski, B. Kisler, A. Kremer, S. Leo, S. Lin-Gibson, A. Marsano, M. Mattavelli, J. Moore, H. Nakae, I. Perseil, A. Salman, J. Sluka, S. Soiland-Reyes, C. Strambio-De-Castillia, M. Sussman, J.R. Swedlow, K. Zatloukal, J. Geiger, “Toward a common standard for data and specimen provenance in life sciences”, Learn Health Sys., e10365, April 2023, doi: https://doi.org/10.1002/lrh2.10365
“ISO/TS 23494-1:2023 Biotechnology — Provenance information model for biological material and data — Part 1: Design concepts and general requirements” https://www.iso.org/standard/80715.html (accessed 25 April 2023)  Semler et al 2018: German Medical Informatics Initiative; Methods Inf Med. 2018 Jul; 57(Suppl 1): e50–e56.PMID: 30016818
The Medical Informatics Initiative Core data set: https://www.medizininformatik-initiative.de/en/medical-informatics-initiatives-core-data-set (accessed April 23, 2023)7 FHIR Provenance: https://fhir-ru.github.io/provenance.html (accessed April 23, 2023)
Conference Proceedings Volume
Copyright (c) 2023 Ulrich Sax, Christian Henke, Christian Dräger, Theresa Bender, Alessandra Kuntz, Martin Golebiewski, Hannes Ulrich, Mattias Löbe
This work is licensed under a Creative Commons Attribution 4.0 International License.
Grant numbers 442326535 (NFDI4health);451265285 NFDI4health TF COVID19;315072261 (NMDR2)