Provenance Core Data Set

A Minimal Information Model for Data Provenance in Biomedical Research




data provenance, data lineage, life sciences, Harmonizing RDM, Linking RDM


The exchange, dissemination, and reuse of biological specimens and data have become essentialfor life sciences research. This requires standards that enable cross-organizational documentation, traceability, and tracking of data and its corresponding metadata. Thus, data provenance, or the lineage of data, is an important aspect of data management in any information system integrating data from different sources [1]. It provides crucial information about the origin, transformation, and accountability of data, which is essential for ensuring trustworthiness, transparency, and quality of healthcare data [2]. For biological material and derived data, a novel ISO standard was recently introduced that specifies a general concept for a provenance information model for biological material and data and requirements for provenance data interoperability and serialization [3,4]. However, a specific standard for health data provenance is currently missing. In recent years, there has been a growing need for developing a minimal core data set for representing provenance information in health information systems. This paper presents a Provenance Core Data Set (PCDS), a generalized data model that aims to provide a set of attributes for describing data provenance in health information systems and beyond. 


Download data is not yet available.


L. Moreau, and P. Missier, PROV-DM: The PROV Data Model, (2013). (accessed April 24, 2023).

Parciak et al 2019: Provenance Solutions for Medical Research in Heterogeneous IT-Infrastructure: An Implementation Roadmap; Studies in Health Technology and Informatics Volume 264: MEDINFO 2019: Health and Wellbeing e-Networks for All; DOI 10.3233/SHTI190231[3] R. Wittner, P. Holub, C. Mascia, F. Frexia, H. Müller, M. Plass, C. Allocca, F. Betsou, T. Burdett, I. Cancio, A. Chapman, M. Chapman, M. Courtot, V. Curcin, J. Eder, M. Elliot, K. Exter, C. Goble, M. Golebiewski, B. Kisler, A. Kremer, S. Leo, S. Lin-Gibson, A. Marsano, M. Mattavelli, J. Moore, H. Nakae, I. Perseil, A. Salman, J. Sluka, S. Soiland-Reyes, C. Strambio-De-Castillia, M. Sussman, J.R. Swedlow, K. Zatloukal, J. Geiger, “Toward a common standard for data and specimen provenance in life sciences”, Learn Health Sys., e10365, April 2023, doi:

“ISO/TS 23494-1:2023 Biotechnology — Provenance information model for biological material and data — Part 1: Design concepts and general requirements” (accessed 25 April 2023) [5] Semler et al 2018: German Medical Informatics Initiative; Methods Inf Med. 2018 Jul; 57(Suppl 1): e50–e56.PMID: 30016818

The Medical Informatics Initiative Core data set: (accessed April 23, 2023)7 FHIR Provenance: (accessed April 23, 2023)




How to Cite

Sax, U., Henke, C., Dräger, C., Bender, T., Kuntz, A., Golebiewski, M., … Löbe, M. (2023). Provenance Core Data Set: A Minimal Information Model for Data Provenance in Biomedical Research. Proceedings of the Conference on Research Data Infrastructure , 1.
Received 2023-04-25
Accepted 2023-06-29
Published 2023-09-07

Funding data