Harmonising, Harvesting, and Searching Metadata Across a Repository Federation





Metadata, Structured Markup, JSON LD, schema.org, Bioschemas, OAI-PMH, Harvesting


The collection of metadata for research data is an important aspect in the FAIR principles. The schema.org and Bioschemas initiatives created a vocabulary to embed markup for many different types, including BioChemEntity, ChemicalSubstance, Gene, MolecularEntity, Protein, and others relevant in the Natural and Life Sciences with immediate benefits for findability of data packages. To bridge the gap between the worlds of semantic-web-driven JSON+LD metadata on the one hand, and established but separately developed interface services in libraries, we have designed an architecture for harmonising, federating and harvesting metadata from several resources. Our approach is to serve JSON+LD embedded in an XML container through a central OAI-Provider. Several resources in NFDI4Chem provide such domain-specific metadata. The CKAN-based NFDI4Chem search service can harvest this metadata using an OAI-PMH harvester extension that can extract the XML-encapsulated JSON+LD metadata, and has search capabilities relevant in the chemistry domain. We invite the community to collaborate and reach a critical mass of providers and consumers in the NFDI.


“Introducing schema.org: Search engines come together for a richer web,” Official Google Blog. https://googleblog.blogspot.com/2011/06/introducing-schemaorg-search-engines.html (accessed Jan. 29, 2023).

F. Michel and The Bioschemas Community, “Bioschemas & Schema.org: a Lightweight Semantic Layer for Life Sciences Websites,” Biodiversity Information Science and Standards, vol. 2. p. e25836, 2018. doi: 10.3897/biss.2.25836.

C. Lagoze and H. Van de Sompel, “The making of the open archives initiative protocol for metadata harvesting,” Libr. Hi Tech, vol. 21, no. 2, pp. 118–128, Jun. 2003, doi: 10.1108/07378830310479776.

C. Steinbeck et al., “NFDI4Chem - Towards a National Research Data Infrastructure for Chemistry in Germany,” Riogrande Odontol., vol. 6, p. e55852, Jun. 2020, doi: 10.3897/rio.6.e55852.

H. Horai et al., “MassBank: a public repository for sharing mass spectral data for life sciences,” J. Mass Spectrom., vol. 45, no. 7, pp. 703–714, Jul. 2010, doi: 10.1002/jms.1777.

S.-A. Sansone et al., “Toward interoperable bioscience data,” Nat. Genet., vol. 44, no. 2, pp. 121–126, Jan. 2012, doi: 10.1038/ng.1054.




Conference Proceedings Volume


Poster presentations II (Call for Papers)
Received 2023-04-26
Accepted 2023-06-30
Published 2023-09-07

Funding data