T-GreC at LLMs4OL 2025 Task B: A Report on Term-Typing Task of OBI dataset using LLM with k-Nearest Neighbors

Authors

DOI:

https://doi.org/10.52825/ocp.v6i.2898

Keywords:

Ontology Learning, Term-Typing, Large Language Models, K-Nearest Neighbors

Abstract

This report presents an approach that combines large language models' (LLMs) embedding with k-nearest neighbors (k-NN) for the term-typing task on the OBI (Ontology for Biomedical Investigations) dataset. We investigate the effectiveness of transformer models namely PubMedBERT, BioBERT, DeBERTa-v3, and RoBERTa with k-NN classification using the embedding of each respective model. Our experimental results demonstrate that fine-tuned LLMs not only have the capability to do term typing on their own but also can learn effective embeddings that are exploitable by k-NN for solving the task, with RoBERTa achieving the highest F1 score of 0.827 and k-NN using embedding from the model with score of 0.862. The study reveals that embeddings from transformer models, when used as semantic representations for similarity-based method, improve classification accuracy in this specific case.

Downloads

Download data is not yet available.

References

Y. Gu, R. Tinn, H. Cheng, et al., “Domain-specific language model pretraining for biomedical natural language processing,” ACM Transactions on Computing for Healthcare (HEALTH), vol. 3, no. 1, pp. 1–23, 2021.

J. Lee, W. Yoon, S. Kim, D. Kim, C. H. So, and J. Kang, “Biobert: A pre-trained biomedical language representation model for biomedical text mining,” Bioinformatics, vol. 36, no. 4, pp. 1234–1240, 2020.

P. Cunningham and S. J. Delany, “K-Nearest Neighbour Classifiers: 2nd Edition (with Python examples),” en, ACM Computing Surveys, vol. 54, no. 6, pp. 1–25, Jul. 2022, arXiv:2004.04523 [cs], ISSN: 0360-0300, 1557-7341. DOI: 10 . 1145 / 3459665. [Online]. Available: http://arxiv.org/abs/2004.04523 (visited on 08/08/2025).

H. Babaei Giglou, J. D’Souza, and S. Auer, “Llms4ol: Large language models for ontology learning,” in The Semantic Web – ISWC 2023, T. R. Payne, V. Presutti, G. Qi, et al., Eds., Cham: Springer Nature Switzerland, 2023, pp. 408–427, ISBN: 978-3-031-47240-4.

P. He, X. Gao, J. Chen, and J. Gao, “Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing,” arXiv preprint arXiv:2111.09543, 2021.

Y. Liu, M. Ott, N. Goyal, et al., “Roberta: A robustly optimized bert pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.

H. B. Giglou, J. D’Souza, and S. Auer, LLMs4OL 2024 Overview: The 1st Large Language Models for Ontology Learning Challenge, arXiv:2409.10146 [cs], Sep. 2024. DOI: 10 .48550/arXiv.2409.10146. [Online]. Available: http://arxiv.org/abs/2409.10146 (visited on 08/06/2025).

P. Kumar Goyal, S. Singh, and U. Shanker Tiwary, “Silp nlp at LLMs4OL 2024 Tasks A, B, and C: Ontology Learning through Prompts with LLMs,” en, Open Conference Proceedings, vol. 4, pp. 31–38, Oct. 2024, ISSN: 2749-5841. DOI: 10.52825/ocp.v4i.2485. [Online]. Available: https://www.tib-op.org/ojs/index.php/ocp/article/view/2485 (visited on 08/07/2025).

C. Eang and S. Lee, “Improving the Accuracy and Effectiveness of Text Classification Based on the Integration of the Bert Model and a Recurrent Neural Network (RNN bert based),” en, Applied Sciences, vol. 14, no. 18, p. 8388, Sep. 2024, ISSN: 2076-3417. DOI: 10.3390/app14188388. [Online]. Available: https://www.mdpi.com/2076-3417/14/18/8388 (visited on 08/07/2025).

H. Babaei Giglou, J. D’Souza, N. Mihindukulasooriya, and S. Auer, “Llms4ol 2025 overview: The 2nd large language models for ontology learning challenge,” Open Conference Proceedings, 2025.

E. J. Hu, Y. Shen, P. Wallis, et al., “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2022.

S. Okamoto and K. Satoh, “An average-case analysis of k-nearest neighbor classifier,” en, in Case-Based Reasoning Research and Development, J. G. Carbonell, J. Siekmann, G. Goos, et al., Eds., vol. 1010, Series Title: Lecture Notes in Computer Science, Berlin, Heidelberg: Springer Berlin Heidelberg, 1995, pp. 253–264, ISBN: 978-3-540-60598-0 978-3-540-48446-2. DOI: 10 . 1007 / 3 - 540 - 60598 - 3 _ 23. [Online]. Available: http://link.springer.com/10.1007/3-540-60598-3_23 (visited on 08/08/2025).

Downloads

Published

2025-10-01

How to Cite

Yimmark, C., & Racharak, T. (2025). T-GreC at LLMs4OL 2025 Task B: A Report on Term-Typing Task of OBI dataset using LLM with k-Nearest Neighbors. Open Conference Proceedings, 6. https://doi.org/10.52825/ocp.v6i.2898

Conference Proceedings Volume

Section

LLMs4OL 2025 Task Participant Short Papers