Echo-LLM Evidence-Checked Hierarchical Ontology

Authors

DOI:

https://doi.org/10.52825/ocp.v8i.3173

Keywords:

Knowledge Graphs, Ontology Induction, Large Language Models, Retrieval Augmented Generation, Natural Language Inference

Abstract

Large language models can draft ontologies, but unverified extraction yields hallucinated triples—producing plausible yet incorrect facts. EchoLLM is a text-only, evidence-grounded pipeline for ontology construction. Candidate triples are first extracted with an instruction-following LLM. A hybrid retriever (BM25 + dense) gathers sentence-level evidence for each triple. Natural language inference then tests whether the evidence entails the triple; only entailed, lexically consistent hypotheses are accepted, and all decisions are logged. Accepted entities are embedded and clustered to induce classes and a lightweight hierarchy; rdfs:comment is generated from supporting text. The result is a validated triple set and an initial ontology suitable for bootstrapping domain knowledge graphs. The construction design favors high precision which requires no domain-specific rules, and surfaces failure modes (extraction, retrieval, verification). This enables authors and subject-matter experts to build trustworthy knowledge graphs quickly while keeping model and cost choices flexible.

Downloads

Download data is not yet available.

References

[1] T. Brown, B. Mann, N. Ryder et al., "Language models are few-shot learners", Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.

[2] S. Sadruddin, J. D’Souza, E. Poupaki et al., "LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models", in European Semantic Web Conference, 2025, pp. 244–261.

[3] H. Babaei Giglou, J. D’Souza, and S. Auer, "LLMs4OL: Large language models for ontology learning", in International Semantic Web Conference, 2023, pp. 408–427.

[4] T. Aggarwal, A. Salatino, F. Osborne, and E. Motta, "Large language models for scholarly ontology generation: An extensive analysis in the engineering field", Information Processing & Management, vol. 63, no. 1, p. 104262, 2026.

[5] A. S. Dalal, Y. Zhang, D. Dougan, A. M. .Ileri, and H. K. McGinty, "Flavonoid Fusion: Creating a Knowledge Graph to Unveil the Interplay Between Food and Health", arXiv preprint arXiv:2510.06433, 2025.

[6] A. S. Dalal, S. Abadifard, and H. K. McGinty, "GLIIDE: Global-Local Image Integration via Descriptive Extraction", in Proceedings of the 13th Knowledge Capture Conference 2025, 2025, pp. 194–197.

[7] Y. Zhang, A. S. Dalal, C. Martin, S. R. Gadusu, and H. K. McGinty, "OLIVE: Ontology learning with integrated vector embeddings", Applied Ontology, vol. 20, no. 1, pp. 36–53, 2025.

[8] Z. Ji, N. Lee, R. Frieske et al., "Survey of hallucination in natural language generation", ACM computing surveys, vol. 55, no. 12, pp. 1–38, 2023.

[9] R. Bommasani, D. A. Hudson, E. Adeli et al., "On the opportunities and risks of foundation models", arXiv preprint arXiv:2108.07258, 2021.

[10] P. Lewis, E. Perez, A. Piktus et al., "Retrieval-augmented generation for knowledge-intensive nlp tasks", Advances in neural information processing systems, vol. 33, pp. 9459–9474, 2020.

[11] T. Bruckhaus, "Rag does not work for enterprises", arXiv preprint arXiv:2406.04369, 2024.

[12] S. E. Robertson, S. Walker, S. Jones, M. M. Hancock-Beaulieu, M. Gatford, and others, "Okapi at TREC-3", Nist Special Publication Sp, vol. 109, p. 109, 1995.

[13] B. J. Chan, C. Chen, J. Cheng, and H. Huang, "Don't do rag: When cache-augmented generation is all you need for knowledge tasks", in Companion Proceedings of the ACM on Web Conference 2025, 2025, pp. 893–897.

[14] N. Reimers, and I. Gurevych, "Sentence-bert: Sentence embeddings using siamese bert-networks", arXiv preprint arXiv:1908.10084, 2019.

[15] P. Mandikal, and R. Mooney, "Sparse meets dense: A hybrid approach to enhance scientific document retrieval", arXiv preprint arXiv:2401.04055, 2024.

[16] D. Lee, S. Hwang, K. Lee, S. Choi, and S. Park, "On complementarity objectives for hybrid retrieval", in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023, pp. 13357–13368.

[17] G. V. Cormack, C. L. Clarke, and S. Buettcher, "Reciprocal rank fusion outperforms condorcet and individual rank learning methods", in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, 2009, pp. 758–759.

[18] T. Chen, H. Wang, S. Chen et al., "Dense x retrieval: What retrieval granularity should we use?", in Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024, pp. 15159–15177.

[19] S. R. Bowman, G. Angeli, C. Potts, and C. D. Manning, "A large annotated corpus for learning natural language inference", arXiv preprint arXiv:1508.05326, 2015.

[20] M. Pàmies, J. Llop, F. Multari et al., "A weakly supervised textual entailment approach to zero-shot text classification", in Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023, pp. 286–296.

[21] D. Tam, A. Mascarenhas, S. Zhang, S. Kwan, M. Bansal, and C. Raffel, "Evaluating the factual consistency of large language models through news summarization", arXiv preprint arXiv:2211.08412, 2022.

[22] D. Hendrycks, C. Burns, S. Basart et al., "Measuring massive multitask language understanding", arXiv preprint arXiv:2009.03300, 2020.

[23] M. J. Saeedizade, and E. Blomqvist, "Navigating ontology development with large language models", in European Semantic Web Conference, 2024, pp. 143–161.

[24] N. Mihindukulasooriya, S. Tiwari, C. F. Enguix, and K. Lata, "Text2kgbench: A benchmark for ontology-driven knowledge graph generation from text", in International semantic web conference, 2023, pp. 247–265.

[25] H. Yang, L. Xiao, R. Zhu, Z. Liu, and J. Chen, "An LLM supported approach to ontology and knowledge graph construction", in 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2024, pp. 5240–5246.

[26] G. Stanovsky, and I. Dagan, "Creating a large benchmark for open information extraction", in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, pp. 2300–2305.

[27] R. Schneider, T. Oberhauser, T. Klatt, F. A. Gers, and A. Löser, "Analysing errors of open information extraction systems", arXiv preprint arXiv:1707.07499, 2017.

[28] W. Lechelle, F. Gotti, and P. Langlais, "WiRe57: A fine-grained benchmark for open information extraction", in Proceedings of the 13th Linguistic Annotation Workshop, 2019, pp. 6–15.

Downloads

Published

2025-12-18

How to Cite

Singh Dalal, A., & McGinty, H. (2025). Echo-LLM Evidence-Checked Hierarchical Ontology. Open Conference Proceedings, 8. https://doi.org/10.52825/ocp.v8i.3173

Conference Proceedings Volume

Section

Contributions to "The Second Bridge on Artificial Intelligence for Scholarly Communication"
Received 2025-11-10
Accepted 2025-11-15
Published 2025-12-18