Determining the Similarity of Research Data by Using an Interoperable Metadata Extraction Method




Research Data Management, Metadata, Linked Open Data, Data Similarity, Metadata Similarity, Linked Data


Determining the similarity of research data is not a simple task, as the formats can differ widely depending on the domain. Especially, since many formats are represented as binary files, the raw comparison of these will not yield good results. This makes it hard to accurately tell how similar certain research work is by comparing the data. With the emergence of extracted interoperable metadata, a form to describe data has been provided which is independent of the data format. Therefore, this work tries to use this extracted interoperable metadata and create a method to determine the similarity of research data based on their metadata. The produced method utilizes domain knowledge about the extracted metadata and the way they are formulated. A baseline is created, and further methods are created to compare to. The results show that our method outperforms all other methods, especially the ones which are focused on comparing the research data itself, not the metadata. Since the results are promising, we propose further investigations against other datasets and possible use cases.


Heinrichs, B., & Yazdi, M. A. (2023). Determining the Similarity of Research Data by Using an Interoperable Metadata Extraction Method. Proceedings of the Conference on Research Data Infrastructure , 1.
Received 2023-04-25
Accepted 2023-06-29
Published 2023-09-07