CUET_Zenith at LLMs4OL 2025 Task C: Hybrid Embedding-LLM Architectures for Taxonomy Discovery

Rehenuma Ilman; Mehreen Rahman; Samia Rahman

doi:10.52825/ocp.v6i.2896

Authors

Rehenuma Ilman Chittagong University of Engineering & Technology https://orcid.org/0009-0009-6161-9687
Mehreen Rahman Chittagong University of Engineering & Technology https://orcid.org/0009-0005-0656-1153
Samia Rahman Chittagong University of Engineering & Technology

DOI:

https://doi.org/10.52825/ocp.v6i.2896

Keywords:

Ontology Learning, Taxonomy Discovery, Large Language Models, Hybrid Architectures, Biomedical Ontologies

Abstract

Taxonomy discovery, the identification of hierarchical relationships within ontological structures, constitutes a foundational challenge in ontology learning. Our submission to the LLMs4OL 2025 challenge, employing hybrid architectures to address this task across both biomedical (Subtask C1: OBI) and general-purpose (Subtask C5: SchemaOrg) knowledge domains. For C1, we have integrated semantic clustering of Sentence-BERT embeddings with few-shot prompting using Qwen-3 (14B), enabling domain-specific hierarchy induction without task-specific fine-tuning. For C5, we have introduced a cascaded validation framework, harmonizing deep semantic representations from sentence transformer {all-mpnet-base-v2}, ensemble classification via XGBoost, and a hierarchical LLM-based reasoning pipeline utilizing TinyLlama and GPT-4o. To address inherent class imbalances, we have employed SMOTE-based augmentation and gated inference thresholds. Empirical results demonstrate that our hybrid methodology achieves competitive performance, confirming that the judicious integration of classical machine learning with large language models yields efficient and scalable solutions for ontology structure induction. Code implementations are publicly available.

Downloads

Download data is not yet available.

References

H. B. Giglou, J. D'Souza, and S. Auer, LLMs4OL 2024 Overview: The 1st Large Language Models for Ontology Learning Challenge, 2024. [Online]. Available: https://arxiv.org/abs/2409.10146, arXiv: 2409.10146 [cs.CL].

H. Babaei Giglou, J. D'Souza, A. C. Aioanei, N. Mihindukulasooriya, and S. Auer, "LLMs4OL 2025 Overview: The 2nd Large Language Models for Ontology Learning Challenge", Open Conference Proceedings, 2025.

N. Reimers, and I. Gurevych, "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", arXiv preprint arXiv:1908.10084, 2019.

F. Pedregosa, G. Varoquaux, A. Gramfort et al., "Scikit-learn: Machine Learning in Python", 2011.

A. Cloud, "Qwen3: A family of open multilingual large language models", https://github.com/QwenLM/Qwen, 2024.

U. Team, "Unsloth: Fastest way to fine-tune LLMs", https://github.com/unslothai/unsloth, 2024.

H. Face, and M. Research, "All-mpnet-base-v2: A 768-dimensional Sentence Embedding Model", https://huggingface.co/sentence-transformers/all-mpnet-base-v2, 2020.

Dataloop AI Research Team, "Advanced Feature Engineering for Hybrid Semantic-Lexical Representations", 2023, Combines embedding concatenation, element-wise operations, and lexical features for relational modeling :cite[2].

Y. Yang, and Z. Xu, "Imbalance-XGBoost: Leveraging Weighted and Focal Losses for Binary Label-Imbalanced Classification", Pattern Recognition Letters, vol. 136, pp. 190–197, 2020, Introduces SMOTE-compatible loss functions for XGBoost in skewed datasets. DOI: 10.1016/j.patrec.2020.03.030.

J. Ip, "LLM Evaluation Metrics: Token Probability Methods for Validation", https://www.confident-ai.com/blog/llm-evaluation-metrics-everything-you-need-for-llm-evaluation, 2024.

H. e. a. Taherkhani, "VALTEST: Multi-tier Validation Framework for LLM Outputs", https://arxiv.org/html/2411.08254v1, 2024.

R. Ilman, "LLMs4OL-2025-Large-Language-Models-for-Ontology-Learning", https://github.com/RehenumaIlman/LLMs4OL-2025-Large-Language-Models-for-Ontology-Learning, 2025.

M. Aslan, and S. Yildiz, "Ontology learning from text: A survey of methods", Knowledge Engineering Review, 2005.

W. Hwang, "An Information Retrieval Approach to Ontology Learning", Journal of Intelligent Information Systems, 2002.

P. Cimiano, Ontology Learning and Population from Text: Algorithms, Evaluation and Applications, Springer, 2006.

S. M. H. Hashemi, M. K. Manesh, and M. Shamsfard, "Skh-nlp at llms4ol 2024 task b: Taxonomy discovery in ontologies using bert and llama 3", in Open Conference Proceedings, vol. 4, 2024, pp. 103–111.

H. B. Giglou, J. D’Souza, and S. Auer, "LLMs4OL 2024 Overview: The 1st Large Language Models for Ontology Learning Challenge", Open Conference Proceedings, vol. 4, pp. 3–16, Oct. 2024. DOI: 10.52825/ocp.v4i.2473. [Online]. Available: https://www.tib-op.org/ojs/index.php/ocp/article/view/2473.

P. K. Goyal, S. Singh, and U. S. Tiwary, "silp_nlp at LLMs4OL 2024 Tasks A, B, and C: Ontology Learning through Prompts with LLMs", Open Conference Proceedings, vol. 4, pp. 31–38, Oct. 2024. DOI: 10.52825/ocp.v4i.2485. [Online]. Available: https://www.tib-op.org/ojs/index.php/ocp/article/view/2485.

LLMs4OL Organizers, "LLMs4OL-2025 Task C: Taxonomy Discovery Dataset", https://github.com/sciknoworg/LLMs4OL-Challenge/tree/main/2025/TaskC-TaxonomyDiscovery, 2025.

N. Reimers, and I. Gurevych, "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", arXiv preprint arXiv:1908.10084, 2019. [Online]. Available: https://arxiv.org/abs/1908.10084.

Y. Yang, and Z. Xu, "Imbalance-XGBoost: Leveraging Weighted and Focal Losses for Binary Label-Imbalanced Classification", Pattern Recognition Letters, vol. 136, pp. 190–197, 2020. DOI: 10.1016/j.patrec.2020.03.030. [Online]. Available: https://doi.org/10.1016/j.patrec.2020.03.030.

T. Chen, and C. Guestrin, "XGBoost: A Scalable Tree Boosting System", in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 785–794. DOI: 10.1145/2939672.2939785. [Online]. Available: https://doi.org/10.1145/2939672.2939785.

Mehreen1103, "LLMs4OL-2025_Task-C", https://github.com/Mehreen1103/LLMs4OL-2025_Task-C, 2025.