Related papers: Enhancing Financial RAG with Agentic AI and Multi-HyDE: A Novel Approach to Knowledge Retrieval and Hallucination Reduction

Enhancing Financial RAG with Agentic AI and Multi-HyDE: A Novel Approach to Knowledge Retrieval and Hallucination Reduction

URL: http://arxiv.org/abs/2509.16369v1
Date: Fri, 19 Sep 2025 19:24:30 GMT
Title: Enhancing Financial RAG with Agentic AI and Multi-HyDE: A Novel Approach to Knowledge Retrieval and Hallucination Reduction
Authors: Akshay Govind Srinivasan, Ryan Jacob George, Jayden Koshy Joe, Hrushikesh Kant, Harshith M R, Sachin Sundar, Sudharshan Suresh, Rahul Vimalkanth, Vijayavallabh,
Abstract summary: We introduce a framework for financial Retrieval Augmented Generation (RAG)<n>RAG generates multiple, nonequivalent queries to boost the effectiveness and coverage of retrieval from large, structured financial corpora.<n>Our pipeline is optimized for token efficiency and multi-step financial reasoning.
Score: 0.5814806132299305
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Accurate and reliable knowledge retrieval is vital for financial question-answering, where continually updated data sources and complex, high-stakes contexts demand precision. Traditional retrieval systems rely on a single database and retriever, but financial applications require more sophisticated approaches to handle intricate regulatory filings, market analyses, and extensive multi-year reports. We introduce a framework for financial Retrieval Augmented Generation (RAG) that leverages agentic AI and the Multi-HyDE system, an approach that generates multiple, nonequivalent queries to boost the effectiveness and coverage of retrieval from large, structured financial corpora. Our pipeline is optimized for token efficiency and multi-step financial reasoning, and we demonstrate that their combination improves accuracy by 11.2% and reduces hallucinations by 15%. Our method is evaluated on standard financial QA benchmarks, showing that integrating domain-specific retrieval mechanisms such as Multi-HyDE with robust toolsets, including keyword and table-based retrieval, significantly enhances both the accuracy and reliability of answers. This research not only delivers a modular, adaptable retrieval framework for finance but also highlights the importance of structured agent workflows and multi-perspective retrieval for trustworthy deployment of AI in high-stakes financial applications.

Related papers

FinSight: Towards Real-World Financial Deep Research [68.31086471310773]
FinSight is a novel framework for producing high-quality, multimodal financial reports.<n>To ensure professional-grade visualization, we propose an Iterative Vision-Enhanced Mechanism.<n>A two-stage Writing Framework expands concise Chain-of-Analysis segments into coherent, citation-aware, and multimodal reports.
arXiv Detail & Related papers (2025-10-19T14:05:35Z)
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding [61.36285696607487]
Document understanding is critical for applications from financial analysis to scientific discovery.<n>Current approaches, whether OCR-based pipelines feeding Large Language Models (LLMs) or native Multimodal LLMs (MLLMs) face key limitations.<n>Retrieval-Augmented Generation (RAG) helps ground models in external data, but documents' multimodal nature, combining text, tables, charts, and layout, demands a more advanced paradigm: Multimodal RAG.
arXiv Detail & Related papers (2025-10-17T02:33:16Z)
VeritasFi: An Adaptable, Multi-tiered RAG Framework for Multi-modal Financial Question Answering [31.334698403426245]
Retrieval-Augmented Generation (RAG) is becoming increasingly essential for Question Answering (QA) in the financial sector.<n>We present VeritasFi, an innovative hybrid RAG framework that incorporates a multi-modal preprocessing pipeline.<n>By integrating our proposed designs, VeritasFi presents a groundbreaking framework that greatly enhances the adaptability and robustness of financial RAG systems.
arXiv Detail & Related papers (2025-10-12T22:45:24Z)
FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering [57.18367828883773]
FinAgentBench is the first large-scale benchmark for evaluating retrieval with multi-step reasoning in finance.<n>The benchmark consists of 3,429 expert-annotated examples on S&P-100 listed firms.<n>We evaluate a suite of state-of-the-art models and demonstrate how targeted fine-tuning can significantly improve agentic retrieval performance.
arXiv Detail & Related papers (2025-08-07T22:15:22Z)
Agentar-Fin-R1: Enhancing Financial Intelligence through Domain Expertise, Training Efficiency, and Advanced Reasoning [12.548390779247987]
We introduce the Agentar-Fin-R1 series of financial large language models.<n>Our optimization approach integrates a high-quality, systematic financial task label system.<n>Our models undergo comprehensive evaluation on mainstream financial benchmarks.
arXiv Detail & Related papers (2025-07-22T17:52:16Z)
Structuring the Unstructured: A Multi-Agent System for Extracting and Querying Financial KPIs and Guidance [54.25184684077833]
We propose an efficient and scalable method for extracting quantitative insights from unstructured financial documents.<n>Our proposed system consists of two specialized agents: the emphExtraction Agent and the emphText-to-Agent
arXiv Detail & Related papers (2025-05-25T15:45:46Z)
FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation [65.04104723843264]
We present FinDER, an expert-generated dataset tailored for Retrieval-Augmented Generation (RAG) in finance.<n>FinDER focuses on annotating search-relevant evidence by domain experts, offering 5,703 query-evidence-answer triplets.<n>By challenging models to retrieve relevant information from large corpora, FinDER offers a more realistic benchmark for evaluating RAG systems.
arXiv Detail & Related papers (2025-04-22T11:30:13Z)
FinSage: A Multi-aspect RAG System for Financial Filings Question Answering [10.953520766530005]
FinSage is a multi-modal pre-processing pipeline that unifies diverse data formats and generates metadata summaries.<n>Experiments demonstrate that FinSage achieves an impressive recall of 92.51% on 75 expert-curated questions.<n>FinSage has been successfully deployed as financial question-answering agent in online meetings, where it has already served more than 1,200 people.
arXiv Detail & Related papers (2025-04-20T04:58:14Z)
AI for Climate Finance: Agentic Retrieval and Multi-Step Reasoning for Early Warning System Investments [1.3192560874022086]
This study focuses on a real-world application: tracking EWS investments in the Climate Risk and Early Warning Systems (CREWS) Fund.<n>We analyze 25 MDB project documents and evaluate multiple AI-driven classification methods, including zero-shot and few-shot learning.<n>Our results show that the agent-based RAG approach significantly outperforms other methods, achieving 87% accuracy, 89% precision, and 83% recall.
arXiv Detail & Related papers (2025-04-07T14:11:11Z)
Enhancing the Efficiency and Accuracy of Underlying Asset Reviews in Structured Finance: The Application of Multi-agent Framework [3.022596401099308]
We show that AI can automate the verification of information between loan applications and bank statements effectively. This research highlights AI's potential to minimize manual errors and streamline due diligence, suggesting a broader application of AI in financial document analysis and risk management.
arXiv Detail & Related papers (2024-05-07T13:09:49Z)
FinQA: A Dataset of Numerical Reasoning over Financial Data [52.7249610894623]
We focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents. We propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts. The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge.
arXiv Detail & Related papers (2021-09-01T00:08:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.