Echo-LLM Evidence-Checked Hierarchical Ontology
DOI:
https://doi.org/10.52825/ocp.v8i.3173Keywords:
Knowledge Graphs, Ontology Induction, Large Language Models, Retrieval Augmented Generation, Natural Language InferenceAbstract
Large language models can draft ontologies, but unverified extraction yields hallucinated triples—producing plausible yet incorrect facts. EchoLLM is a text-only, evidence-grounded pipeline for ontology construction. Candidate triples are first extracted with an instruction-following LLM. A hybrid retriever (BM25 + dense) gathers sentence-level evidence for each triple. Natural language inference then tests whether the evidence entails the triple; only entailed, lexically consistent hypotheses are accepted, and all decisions are logged. Accepted entities are embedded and clustered to induce classes and a lightweight hierarchy; rdfs:comment is generated from supporting text. The result is a validated triple set and an initial ontology suitable for bootstrapping domain knowledge graphs. The construction design favors high precision which requires no domain-specific rules, and surfaces failure modes (extraction, retrieval, verification). This enables authors and subject-matter experts to build trustworthy knowledge graphs quickly while keeping model and cost choices flexible.
Downloads
References
[1] T. Brown, B. Mann, N. Ryder et al., "Language models are few-shot learners", Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
[2] S. Sadruddin, J. D’Souza, E. Poupaki et al., "LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models", in European Semantic Web Conference, 2025, pp. 244–261.
[3] H. Babaei Giglou, J. D’Souza, and S. Auer, "LLMs4OL: Large language models for ontology learning", in International Semantic Web Conference, 2023, pp. 408–427.
[4] T. Aggarwal, A. Salatino, F. Osborne, and E. Motta, "Large language models for scholarly ontology generation: An extensive analysis in the engineering field", Information Processing & Management, vol. 63, no. 1, p. 104262, 2026.
[5] A. S. Dalal, Y. Zhang, D. Dougan, A. M. .Ileri, and H. K. McGinty, "Flavonoid Fusion: Creating a Knowledge Graph to Unveil the Interplay Between Food and Health", arXiv preprint arXiv:2510.06433, 2025.
[6] A. S. Dalal, S. Abadifard, and H. K. McGinty, "GLIIDE: Global-Local Image Integration via Descriptive Extraction", in Proceedings of the 13th Knowledge Capture Conference 2025, 2025, pp. 194–197.
[7] Y. Zhang, A. S. Dalal, C. Martin, S. R. Gadusu, and H. K. McGinty, "OLIVE: Ontology learning with integrated vector embeddings", Applied Ontology, vol. 20, no. 1, pp. 36–53, 2025.
[8] Z. Ji, N. Lee, R. Frieske et al., "Survey of hallucination in natural language generation", ACM computing surveys, vol. 55, no. 12, pp. 1–38, 2023.
[9] R. Bommasani, D. A. Hudson, E. Adeli et al., "On the opportunities and risks of foundation models", arXiv preprint arXiv:2108.07258, 2021.
[10] P. Lewis, E. Perez, A. Piktus et al., "Retrieval-augmented generation for knowledge-intensive nlp tasks", Advances in neural information processing systems, vol. 33, pp. 9459–9474, 2020.
[11] T. Bruckhaus, "Rag does not work for enterprises", arXiv preprint arXiv:2406.04369, 2024.
[12] S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, M. Gatford, and others, "Okapi at TREC-3", Nist Special Publication Sp, vol. 109, p. 109, 1995.
[13] B. J. Chan, C. Chen, J. Cheng, and H. Huang, "Don't do rag: When cache-augmented generation is all you need for knowledge tasks", in Companion Proceedings of the ACM on Web Conference 2025, 2025, pp. 893–897.
[14] N. Reimers, and I. Gurevych, "Sentence-bert: Sentence embeddings using siamese bert-networks", arXiv preprint arXiv:1908.10084, 2019.
[15] P. Mandikal, and R. Mooney, "Sparse meets dense: A hybrid approach to enhance scientific document retrieval", arXiv preprint arXiv:2401.04055, 2024.
[16] D. Lee, S. Hwang, K. Lee, S. Choi, and S. Park, "On complementarity objectives for hybrid retrieval", in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp. 13357–13368.
[17] G. V. Cormack, C. L. Clarke, and S. Buettcher, "Reciprocal rank fusion outperforms condorcet and individual rank learning methods", in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, 2009, pp. 758–759.
[18] T. Chen, H. Wang, S. Chen et al., "Dense x retrieval: What retrieval granularity should we use?", in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024, pp. 15159–15177.
[19] S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning, "A large annotated corpus for learning natural language inference", arXiv preprint arXiv:1508.05326, 2015.
[20] M. Pàmies, J. Llop, F. Multari et al., "A weakly supervised textual entailment approach to zero-shot text classification", in Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023, pp. 286–296.
[21] D. Tam, A. Mascarenhas, S. Zhang, S. Kwan, M. Bansal, and C. Raffel, "Evaluating the factual consistency of large language models through news summarization", arXiv preprint arXiv:2211.08412, 2022.
[22] D. Hendrycks, C. Burns, S. Basart et al., "Measuring massive multitask language understanding", arXiv preprint arXiv:2009.03300, 2020.
[23] M. J. Saeedizade, and E. Blomqvist, "Navigating ontology development with large language models", in European Semantic Web Conference, 2024, pp. 143–161.
[24] N. Mihindukulasooriya, S. Tiwari, C. F. Enguix, and K. Lata, "Text2kgbench: A benchmark for ontology-driven knowledge graph generation from text", in International semantic web conference, 2023, pp. 247–265.
[25] H. Yang, L. Xiao, R. Zhu, Z. Liu, and J. Chen, "An LLM supported approach to ontology and knowledge graph construction", in 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024, pp. 5240–5246.
[26] G. Stanovsky, and I. Dagan, "Creating a large benchmark for open information extraction", in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, pp. 2300–2305.
[27] R. Schneider, T. Oberhauser, T. Klatt, F. A. Gers, and A. Löser, "Analysing errors of open information extraction systems", arXiv preprint arXiv:1707.07499, 2017.
[28] W. Lechelle, F. Gotti, and P. Langlais, "WiRe57: A fine-grained benchmark for open information extraction", in Proceedings of the 13th Linguistic Annotation Workshop, 2019, pp. 6–15.
Downloads
Published
How to Cite
Conference Proceedings Volume
Section
License
Copyright (c) 2025 Aryan Singh Dalal, Hande McGinty

This work is licensed under a Creative Commons Attribution 4.0 International License.
Accepted 2025-11-15
Published 2025-12-18