Kuchibhotla, Surya and Kuchibhotla, Sri (2025) Retrieval augmented generation system for dynamic document tagging and query-driven retrieval. International Journal of Science and Research Archive, 16 (1). pp. 1452-1462. ISSN 2582-8185
Abstract
We propose a novel Retrieval Augmented Generation (RAG) framework for dynamic document tagging and query driven retrieval. Our system integrates a large language model (LLM) with an explicit memory of documents to generate semantic tags for each document and uses these tags to improve retrieval accuracy (Zhou et al., 2024; Li et al., 2024). A query tag feedback loop is then formalized to iteratively refine document annotations based on user queries and present a modular architecture that separates document preprocessing tag generation, storage and retrieval (Sharma et al., 2024). To evaluate such systems, we introduce a synthetic multi domain benchmark containing documents from scientific (ArXiv), governmental, and legal sources, along with ground truth tags and query pools (Kim et al., 2024; Lin et al., 2024). We also define a new Query to Tag Matching Score (Q2T), measuring the semantic alignment between queries and generated tags. Experiments on our benchmark and real world corpora show that dynamic tagging significantly improves recall and annotation quality over static baselines. We include ablation studies isolating the effects of each component and evaluate across multiple domains (e.g. PDFs, filings, rulings). Finally, we discuss ethical implications such as annotation bias and hallucinations, and outline mitigation strategies (Tokunaga et al., 2024). This work provides a rigorous foundation and evaluation framework for adaptive RAG systems in document understanding.
Item Type: | Article |
---|---|
Official URL: | https://doi.org/10.30574/ijsra.2025.16.1.2156 |
Uncontrolled Keywords: | RAG; Retrieval Augmented Generation; Dynamic document tagging; Query driven retrieval |
Date Deposited: | 01 Sep 2025 13:32 |
Related URLs: | |
URI: | https://eprint.scholarsrepository.com/id/eprint/4645 |