Unmediated AI-Assisted Scholarly Citations

Authors

DOI:

https://doi.org/10.52825/ocp.v8i.3161

Keywords:

Language Models, Bibliographic Databases, Model Context Protocol, Citation Management, Scholarly Communication

Abstract

Traditional bibliography databases require users to navigate search forms and manually copy citation data. Language models offer an alternative: a natural-language interface where researchers can write text with informal citation fragments and have them automatically resolved to proper references. However, language models generate fabricated (hallucinated) citations at substantial rates, making them unreliable for scholarly work. We present an architectural approach that combines the natural language interface of LLM chatbots with the accuracy of direct database access, implemented through the Model Context Protocol. Our system enables language models to search bibliographic databases, perform fuzzy matching, and export verified entries, all through conversational interaction. A key architectural principle bypasses the language model during final data export by fetching entries directly from authoritative sources, with timeout protection, to guarantee accuracy. We demonstrate this approach with MCP-DBLP, a server providing access to the DBLP computer science bibliography. The system transforms form-based bibliographic services into conversational assistants that maintain scholarly integrity. This architecture is adaptable to other bibliographic databases and scholarly data sources.

Downloads

Download data is not yet available.

References

[1] A. Agrawal, M. Suzgun, L. Mackey, and A. Kalai, “Do language models know when they’re hallucinating references?” In Findings of the Association for Computational Linguistics: EACL 2024, St. Julian’s, Malta, March 17-22, 2024, Y. Graham and M. Purver, Eds., Association for Computational Linguistics, 2024, pp. 912–928. [Online]. Available: https://aclanthology.org/2024.findings-eacl.62.

[2] E. Kim, F. Kipchumba, and S. Min, “Geographic variation in LLM DOI fabrication: Cross-country analysis of citation accuracy across four large language models,” Publications, Szeider | AI4SC 2026 vol. 13, no. 4, p. 49, Oct. 2025. DOI: 10.3390/publications13040049. [Online]. Available: https://www.mdpi.com/2304-6775/13/4/49.

[3] T. Gao, H. Yen, J. Yu, and D. Chen, “Enabling large language models to generate text with citations,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, December 6-10, 2023, H. Bouamor, J. Pino, and K. Bali, Eds., Association for Computational Linguistics, 2023, pp. 6465–6488. DOI: 10.18653/V1/2023.EMNLP-MAIN.398. [Online]. Available: https://doi.org/10.18653/v1/2023.emnlp-main.398.

[4] R. Aly, Z. Tang, S. Tan, and G. Karypis, “Learning to generate answers with citations via factual consistency models,” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, L. Ku, A. Martins, and V. Srikumar, Eds., Association for Computational Linguistics, 2024, pp. 11 876–11 896. DOI: 10.18653/V1/2024.ACL-LONG.641. [Online]. Available: https://doi.org/10.18653/v1/2024.acl-long.641.

[5] X. Ye, R. Sun, S. ¨O. Arik, and T. Pfister, “Effective large language model adaptation for improved grounding and citation generation,” in Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), NAACL 2024, Mexico City, Mexico, June 16-21, 2024, K. Duh, H. Gómez-Adorno, and S. Bethard, Eds., Association for Computational Linguistics, 2024, pp. 6237–6251. DOI: 10.18653/V1/2024.NAACL-LONG.346. [Online]. Available: https://doi.org/10.18653/v1/2024.naacl-long.346.

[6] W. Li, J. Li, W. Ma, and Y. Liu, “Citation-enhanced generation for llm-based chatbots,” in Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2024, Bangkok, Thailand, August 11-16, 2024, L. Ku, A. Martins, and V. Srikumar, Eds., Association for Computational Linguistics, 2024, pp. 1451–1466. DOI: 10.18653/V1/2024.ACL-LONG.79. [Online]. Available: https://doi.org/10.18653/v1/2024.acl-long.79.

[7] X. Hu, D. Ru, L. Qiu, et al., “Refchecker: Reference-based fine-grained hallucination checker and benchmark for large language models,” CoRR, vol. abs/2405.14486, 2024. DOI: 10.48550/ARXIV.2405.14486. arXiv: 2405.14486. [Online]. Available: https://doi.org/10.48550/arXiv.2405.14486.

[8] H. Maheshwari, S. Tenneti, and A. Nakkiran, “Citefix: Enhancing RAG accuracy through post-processing citation correction,” CoRR, vol. abs/2504.15629, 2025. DOI: 10.48550 /ARXIV.2504.15629. arXiv: 2504.15629. [Online]. Available: https://doi.org/10.48550/arXiv.2504.15629.

[9] Anthropic, Introducing the model context protocol, Anthropic Blog, Blog post, Nov. 2024. [Online]. Available: https://www.anthropic.com/news/model-context-protocol.

[10] Meetanshi, MCP statistics april 2025, Blog post, 2025. [Online]. Available: https://meetanshi.com/blog/mcp-statistics/.

[11] S. Szeider, Supplementary material for unmediated ai-assisted scholarly citations, Zenodo, Nov. 2025. DOI: 10.5281/zenodo.17688073. [Online]. Available: https://doi.org/10.5 281/zenodo.17688073.

Downloads

Published

2025-12-22

How to Cite

Szeider, S. (2025). Unmediated AI-Assisted Scholarly Citations. Open Conference Proceedings, 8. https://doi.org/10.52825/ocp.v8i.3161

Conference Proceedings Volume

Section

Contributions to "The Second Bridge on Artificial Intelligence for Scholarly Communication"
Received 2025-11-06
Accepted 2025-11-15
Published 2025-12-22

Funding data