Evaluating Vector-Based Search and Question Answering Approaches for an Information System

Authors

DOI:

https://doi.org/10.52825/ocp.v9i.3306

Keywords:

Ontology, RAG, Search, Question Answering, Information System, Explainability

Abstract

This paper presents two experiments conducted within the context of an information system. First, it evaluates the potential of vector-based document retrieval in contrast to ontology-based query expansion using a manually curated categorization scheme that is employed in actual practice rather than being constructed specifically for the experiment. Second, it compares the output of a RAG system to that of a group of human domain experts. The findings reveal that a vector-based approach is more effective for this use case and that RAG-generated texts may be able to stylistically compete with those of experts though content needs to be checked.

Downloads

Download data is not yet available.

References

[1] S. Clark, “Vector Space Models of Lexical Meaning,” The Handbook of Contemporary Semantic Theory. S. Lapind and C. Fox (eds), 2015, Chapter 16, doi: https://doi.org/10.1002/9781118882139.ch16

[2] J. Bhogal, A. Macfarlane, P. Smith, “A Review of Ontology Based Query Expansion,” Information Processing & Management, 43, 4, pp. 866–886, Jul. 2007, doi: https://doi.org/10.1016/j.ipm.2006.09.003

[3] A. Broder, “A Taxonomy of Web Search,” ACM Sigir Forum, 36, 2, pp. 3–10, Sept. 2002, doi: https://doi.org/10.1145/792550.792552

[4] T. Mikolov et al., “Efficient Estimation of Word Representations in Vector Space”. in Advances in Neural Information Processing Systems, arXiv:1310.4546, 2013, doi: https://doi.org/10.48550/arXiv.1301.3781

[5] N. Reimers et al. “Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks". In Conference on Empirical Methods in Natural Language Processing 2019, doi: https://doi.org/10.48550/arXiv.1908.10084

[6] M. Douze et al., “The Faiss Library.” arXiv:2401.08281v4, pp. 1–25, Oct. 2025, doi: https://doi.org/10.48550/arXiv.2401.08281

[7] P. Lewis et al. “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks” arXiv:2005.11401, May 2021, doi: https://doi.org/10.48550/arXiv.2005.11401

[8] M. Li, X. Lv, J. Zou, T. Chen, C. Zhang, S. An, E. Nie, G. Zhou, “Query Expansion in the Age of Pre-Trained and Large Language Models: A Comprehensive Survey.” arXiv:2509.07794v2, pp.1–36, Oct. 2025, doi: https://doi.org/10.48550/arXiv.2509.07794

[9] Z. Nussbaum, J. X. Morris, B. Duderstadt, A. Mulyar, „Nomic Embed: Training a Re-producible Long Context Text Embedder,” Transactions on Machine Learning Re-search Feb 2025, doi: https://doi.org/10.48550/arXiv.2402.01613

[10] Yining Wang, Liwei Wang, Yuanzhi Li, Di He, Wei Chen, Tie-Yan Liu. A Theoretical Analysis of Normalized Discounted Cumulative Gain (NDCG) Ranking Measures. In Proceedings of the 26th Annual Conference on Learning Theory (COLT 2013), doi: https://doi.org/10.48550/arXiv.1304.6480

[11] J. Chen, O. Mashakova, F. Zhapa-Camacho, R. Hoehndorf, Y. He, I. Horrocks, “On-tology Embedding: A Survey of Methods, Applications and Resources, IEEE Transac-tions on Knowledge and Data Engineering, 37, 7, pp. 4193–4212, Jul. 2025, doi: https://doi.org/10.1109/TKDE.2025.3559023

Downloads

Published

2026-03-23

How to Cite

Dembach, M., & Decher, S. (2026). Evaluating Vector-Based Search and Question Answering Approaches for an Information System. Open Conference Proceedings, 9. https://doi.org/10.52825/ocp.v9i.3306

Conference Proceedings Volume

Section

Proceedings to the 3rd NFDI4Energy Conference - Full Papers

Funding data