Alexbek at LLMs4OL 2025 Tasks A, B, and C: Heterogeneous LLM Methods for Ontology Learning (Few-Shot Prompting, Ensemble Typing, and Attention-Based Taxonomies)

Aleksandra Beliaeva; Temurbek Rahmatullaev

doi:10.52825/ocp.v6i.2899

Authors

Aleksandra Beliaeva Skolkovo Institute of Science and Technology https://orcid.org/0009-0005-4627-3666
Temurbek Rahmatullaev Lomonosov Moscow State University https://orcid.org/0000-0001-5570-0010

DOI:

https://doi.org/10.52825/ocp.v6i.2899

Keywords:

Large Language Models, Ontology Engineering (OE), Ontology Learning, Domain-Specific Knowledge, Retrieval Augmented Generation, Term Typing, Taxonomy Discovery

Abstract

We present a comprehensive system for addressing Tasks A, B, and C of the LLMs4OL 2025 challenge, which together span the full ontology construction pipeline: term extraction, typing, and taxonomy discovery. Our approach combines retrieval-augmented prompting, zero-shot classification, and attention-based graph modeling — each tailored to the demands of the respective task.

For Task A, we jointly extract domain-specific terms and their ontological types using a retrieval-augmented generation (RAG) pipeline.
Training data was reformulated into a document to terms and types correspondence, while test-time inference leverages semantically similar training examples. This single-pass method requires no model finetuning and improves overall performance through lexical augmentation

Task B, which involves assigning types to given terms, is handled via a dual strategy. In the few-shot setting (for domains with labeled training data), we reuse the RAG scheme with few-shot prompting. In the zero-shot setting (for previously unseen domains), we use a zero-shot classifier that combines cosine similarity scores from multiple embedding models using confidence-based weighting.

In Task C, we model taxonomy discovery as graph inference. Using embeddings of type labels, we train a lightweight cross-attention layer to predict is-a relations by approximating a soft adjacency matrix.

These modular, task-specific solutions enabled us to achieve top-ranking results in the official leaderboard across all three tasks. Taken together these strategies showcase the scalability, adaptability, and robustness of LLM-based architectures for ontology learning across heterogeneous domains.

Code is available at: https://github.com/BelyaevaAlex/LLMs4OL-Challenge-Alexbek

Downloads

Download data is not yet available.

References

H. B. Giglou, J. D’Souza, and S. Auer, Llms4ol: Large language models for ontology learning, 2023. arXiv: 2307.16648 [cs.AI]. [Online]. Available: https://arxiv.org/abs/2307.16648.

M. Sanaei, F. Azizi, and H. Babaei Giglou, “Phoenixes at llms4ol 2024 tasks a, b, and c: Retrieval augmented generation for ontology learning,” Open Conference Proceedings, vol. 4, pp. 39–47, Oct. 2024. DOI : 10.52825/ocp.v4i.2482.

H. B. Giglou, J. D’Souza, and S. Auer, Llms4ol 2024 overview: The 1st large language models for ontology learning challenge, 2024. arXiv: 2409.10146 [cs.CL]. [Online]. Available: https://arxiv.org/abs/2409.10146.

A. Lo, A. Q. Jiang, W. Li, and M. Jamnik, End-to-end ontology learning with large language models, 2024. arXiv: 2410.23584 [cs.LG]. [Online]. Available: https://arxiv.org/abs/2410.23584.

N. Fathallah, S. Staab, and A. Algergawy, Llms4life: Large language models for ontology learning in life sciences, 2024. arXiv: 2412 . 02035 [cs.AI]. [Online]. Available: https ://arxiv.org/abs/2412.02035.

Y. Zhang, M. Li, D. Long, et al., “Qwen3 embedding: Advancing text embedding and reranking through foundation models,” arXiv preprint arXiv:2506.05176, 2025.

Yoshino-s, Outline Python API library, https://github.com/yoshino-s/outline-python-api/tree/main, used on: 16.07.2025, 2024.

Alexbek at LLMs4OL 2025 Tasks A, B, and C: Heterogeneous LLM Methods for Ontology Learning (Few-Shot Prompting, Ensemble Typing, and Attention-Based Taxonomies)

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Conference Proceedings Volume

Section

License

Funding data