LLMs4OL 2024 Datasets: Toward Ontology Learning with Large Language Models

Authors

DOI:

https://doi.org/10.52825/ocp.v4i.2480

Keywords:

Ontology Learning, Large Language Models, Dataset, LLMs4OL Challenge

Abstract

Ontology learning (OL) from unstructured data has evolved significantly, with recent advancements integrating large language models (LLMs) to enhance various aspects of the process. The paper introduces the LLMs4OL 2024 datasets, developed to benchmark and advance research in OL using LLMs. The LLMs4OL 2024 dataset as a key component of the LLMs4OL Challenge, targets three primary OL tasks: Term Typing, Taxonomy Discovery, and Non-Taxonomic Relation Extraction. It encompasses seven domains, i.e. lexosemantics and biological functions, offering a comprehensive resource for evaluating LLM-based OL approaches Each task within the dataset is carefully crafted to facilitate both Few-Shot (FS) and Zero-Shot (ZS) evaluation scenarios, allowing for robust assessment of model performance across different knowledge domains to address a critical gap in the field by offering standardized benchmarks for fair comparison for evaluating LLM applications in OL.

Downloads

Download data is not yet available.

References

[1] A. Maedche and S. Staab, “Ontology learning,” in Handbook on Ontologies, S. Staab and R. Studer, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004, pp. 173–190, ISBN : 978-3-540-24750-0. DOI : 10 . 1007 / 978 - 3 - 540 - 24750 - 0 _ 9. [Online]. Available: https://doi.org/10.1007/978-3-540-24750-0_9.

[2] A. Konys, “Knowledge repository of ontology learning tools from text,” Procedia Computer Science, vol. 159, pp. 1614–1628, 2019.

[3] Y. Ding and S. Foo, “Ontology research and development. part 2-a review of ontology mapping and evolving,” Journal of information science, vol. 28, no. 5, pp. 375–388, 2002.

[4] M. Shamsfard and A. Abdollahzadeh Barforoush, “The state of the art in ontology learning: A framework for comparison,” Knowl. Eng. Rev., vol. 18, no. 4, pp. 293–316, Dec. 2003, ISSN : 0269-8889. DOI : 10.1017/S0269888903000687. [Online]. Available: https://doi.org/10.1017/S0269888903000687.

[5] P. Buitelaar, P. Cimiano, and B. Magnini, Ontology learning from text: methods, evaluation and applications. IOS press, 2005, vol. 123.

[6] M. Hazman, S. R. El-Beltagy, and A. Rafea, “A survey of ontology learning approaches,” International Journal of Computer Applications, vol. 22, no. 9, pp. 36–43, 2011.

[7] M. N. Asim, M. Wasim, M. U. G. Khan, W. Mahmood, and H. M. Abbasi, “A survey of ontology learning techniques and applications,” Database, vol. 2018, bay101, Oct. 2018, ISSN: 1758-0463. DOI: 10.1093/database/bay101. eprint: https://academic.oup.com/database/article-pdf/doi/10.1093/database/bay101/27329264/bay101.pdf. [Online]. Available: https://doi.org/10.1093/database/bay101.

[8] H. Babaei Giglou, J. D’Souza, and S. Auer, “Llms4ol: Large language models for ontology learning,” in The Semantic Web – ISWC 2023, T. R. Payne, V. Presutti, G. Qi, et al., Eds., Cham: Springer Nature Switzerland, 2023, pp. 408–427, ISBN: 978-3-031-47240-4.

[9] F. Petroni, T. Rocktäschel, P. Lewis, et al., Language models as knowledge bases?2019. arXiv: 1909.01066 [cs.CL]. [Online]. Available: https://arxiv.org/abs/1909.01066.

[10] B. Zhang, V. A. Carriero, K. Schreiberhuber, et al., “Ontochat: A framework for conversational ontology engineering using language models,” arXiv preprint arXiv:2403.05921, 2024.

[11] V. K. Kommineni, B. König-Ries, and S. Samuel, “From human experts to machines: An llm supported approach to ontology and knowledge graph construction,” arXiv preprint arXiv:2403.08345, 2024.

[12] M. J. Saeedizade and E. Blomqvist, “Navigating ontology development with large language models,” in European Semantic Web Conference, Springer, 2024, pp. 143–161.

[13] R. Du, H. An, K. Wang, and W. Liu, A short review for ontology learning: Stride to large language models trend, 2024. arXiv: 2404.14991 [cs.IR]. [Online]. Available: https://arxiv.org/abs/2404.14991.

[14] H. Khorashadizadeh, F. Z. Amara, M. Ezzabady, et al., Research trends for the inter-play between large language models and knowledge graphs, 2024. arXiv: 2406.08223 [cs.AI]. [Online]. Available: https://arxiv.org/abs/2406.08223.

[15] H. Babaei Giglou, J. D’Souza, and S. Auer, “Llms4ol 2024 overview: The 1st large language models for ontology learning challenge,” Open Conference Proceedings, vol. 4, Oct. 2024.

[16] A. Maedche and S. Staab, “Ontology learning for the semantic web,” IEEE Intelligent systems, vol. 16, no. 2, pp. 72–79, 2001.

[17] G. A. Miller, “Wordnet: A lexical database for english,” Communications of the ACM, vol. 38, no. 11, pp. 39–41, 1995.

[18] Geonames geographical database, 2023. [Online]. Available: http://www.geonames.org/.

[19] O. Bodenreider, “The Unified Medical Language System (UMLS): integrating biomedical terminology,” Nucleic Acids Research, vol. 32, no. suppl 1, pp. D267–D270, Jan. 2004, ISSN : 0305-1048. DOI : 10.1093/nar/gkh061.eprint: https://academic.oup.com/nar/article - pdf / 32 / suppl _1 / D267 / 7621558 / gkh061 . pdf. [Online]. Available: https://doi.org/10.1093/nar/gkh061.

[20] National Library of Medicine (US), US Edition of SNOMED CT, http://www.nlm.nih.gov/research/umls/Snomed/us_edition.html, Bethesda, MD, 2013.

[21] National Cancer Institute (US), NCI Enterprise Vocabulary Services (EVS), https://www.cancer.gov/research/resources/terminology, Bethesda, MD, 2015.

[22] Medicomp Systems, Inc., MEDCIN, http://www.medicomp.com/index _ html. htm, Chantilly, VA, 2004.

[23] S. Carbon and C. Mungall, Gene ontology data archive, version 2024-01-17, Zenodo, Jan. 2024. DOI : 10.5281/zenodo.10536401. [Online]. Available: https://doi.org/10.5281/zenodo.10536401.

[24] S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. Ives, “Dbpedia: A nucleus for a web of open data,” in The Semantic Web, K. Aberer, K.-S. Choi, N. Noy, et al., Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 2007, pp. 722–735, ISBN : 978-3-540-76298-0.

[25] D. M. Dooley, E. J. Griffiths, G. S. Gosal, et al., “FoodOn: a harmonized food ontology to increase global food traceability, quality control and data integration,” NPJ Science of Food, vol. 2, p. 23, Dec. 2018. DOI: 10.1038/s41538- 018- 0032- 6. [Online]. Available: https://www.nature.com/articles/s41538-018-0032-6.

[26] P. F. Patel-Schneider, “Analyzing schema.org,” in The Semantic Web – ISWC 2014, P. Mika, T. Tudorache, A. Bernstein, et al., Eds., Cham: Springer International Publishing, 2014, pp. 261–276, ISBN : 978-3-319-11964-9.

[27] T. Dettmers, P. Minervini, P. Stenetorp, and S. Riedel, Convolutional 2d knowledge graph embeddings, 2018. arXiv: 1707 . 01476 [cs.LG]. [Online]. Available: https://arxiv.org/abs/1707.01476.

[28] GeoNames, Geonames feature codes, https://www.geonames.org/export/codes.html, 2024.

Published

2024-10-02

How to Cite

Babaei Giglou, H., D’Souza, J., Sadruddin, S., & Auer, S. (2024). LLMs4OL 2024 Datasets: Toward Ontology Learning with Large Language Models. Open Conference Proceedings, 4, 17–30. https://doi.org/10.52825/ocp.v4i.2480

Funding data