A Role-Aware Multi-Agent Framework for Financial Education Question Answering with LLMs
- URL: http://arxiv.org/abs/2509.09727v1
- Date: Wed, 10 Sep 2025 09:40:18 GMT
- Title: A Role-Aware Multi-Agent Framework for Financial Education Question Answering with LLMs
- Authors: Andy Zhu, Yingjun Du,
- Abstract summary: We present a multi-agent framework that leverages role-based prompting to enhance performance on domain-specific QA.<n>Our framework comprises a Base Generator, an Evidence Retriever, and an Expert Reviewer agent that work in a single-pass iteration to produce a refined answer.
- Score: 8.842756364986704
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Question answering (QA) plays a central role in financial education, yet existing large language model (LLM) approaches often fail to capture the nuanced and specialized reasoning required for financial problem-solving. The financial domain demands multistep quantitative reasoning, familiarity with domain-specific terminology, and comprehension of real-world scenarios. We present a multi-agent framework that leverages role-based prompting to enhance performance on domain-specific QA. Our framework comprises a Base Generator, an Evidence Retriever, and an Expert Reviewer agent that work in a single-pass iteration to produce a refined answer. We evaluated our framework on a set of 3,532 expert-designed finance education questions from Study.com, an online learning platform. We leverage retrieval-augmented generation (RAG) for contextual evidence from 6 finance textbooks and prompting strategies for a domain-expert reviewer. Our experiments indicate that critique-based refinement improves answer accuracy by 6.6-8.3% over zero-shot Chain-of-Thought baselines, with the highest performance from Gemini-2.0-Flash. Furthermore, our method enables GPT-4o-mini to achieve performance comparable to the finance-tuned FinGPT-mt_Llama3-8B_LoRA. Our results show a cost-effective approach to enhancing financial QA and offer insights for further research in multi-agent financial LLM systems.
Related papers
- Integrating Domain Knowledge for Financial QA: A Multi-Retriever RAG Approach with LLMs [13.368251290146794]
We implement a multi-retriever Retrieval Augmented Generators system to retrieve both external domain knowledge and internal question contexts.<n>We find that domain-specific training with the SecBERT encoder significantly contributes to our best neural symbolic model.
arXiv Detail & Related papers (2025-12-29T20:24:15Z) - FinDeepResearch: Evaluating Deep Research Agents in Rigorous Financial Analysis [110.5695516127813]
HisRubric is a novel evaluation framework with a hierarchical analytical structure and a fine-grained grading rubric.<n>FinDeepResearch is a benchmark that comprises 64 listed companies from 8 financial markets across 4 languages.<n>We conduct extensive experiments on the FinDeepResearch using 16 representative methods, including 6 DR agents, 5 LLMs equipped with both deep reasoning and search capabilities, and 5 LLMs with deep reasoning capabilities only.
arXiv Detail & Related papers (2025-10-15T17:21:56Z) - FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering [57.43420753842626]
FinLFQA is a benchmark designed to evaluate the ability of Large Language Models to generate long-form answers to complex financial questions.<n>We provide an automatic evaluation framework covering both answer quality and attribution quality.
arXiv Detail & Related papers (2025-10-07T20:06:15Z) - Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models [12.415988471162997]
Fin-PRM is a domain-specialized, trajectory-aware PRM tailored to evaluate intermediate reasoning steps in financial tasks.<n>It integrates step-level and trajectory-level reward supervision, enabling fine-grained evaluation of reasoning traces aligned with financial logic.<n>We show that Fin-PRM consistently outperforms general-purpose PRMs and strong domain baselines in trajectory selection quality.
arXiv Detail & Related papers (2025-08-21T03:31:11Z) - FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering [57.18367828883773]
FinAgentBench is the first large-scale benchmark for evaluating retrieval with multi-step reasoning in finance.<n>The benchmark consists of 3,429 expert-annotated examples on S&P-100 listed firms.<n>We evaluate a suite of state-of-the-art models and demonstrate how targeted fine-tuning can significantly improve agentic retrieval performance.
arXiv Detail & Related papers (2025-08-07T22:15:22Z) - Advanced Financial Reasoning at Scale: A Comprehensive Evaluation of Large Language Models on CFA Level III [0.0]
This paper presents a benchmark evaluating 23 state-of-the-art Large Language Models (LLMs) on the Chartered Financial Analyst (CFA) Level III exam.<n>We assess both multiple-choice questions (MCQs) and essay-style responses using multiple prompting strategies including Chain-of-Thought and Self-Discover.<n>Our evaluation reveals that leading models demonstrate strong capabilities, with composite scores such as 79.1% (o4-mini) and 77.3% (Gemini 2.5 Flash) on CFA Level III.
arXiv Detail & Related papers (2025-06-29T19:54:57Z) - FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation [65.04104723843264]
We present FinDER, an expert-generated dataset tailored for Retrieval-Augmented Generation (RAG) in finance.<n>FinDER focuses on annotating search-relevant evidence by domain experts, offering 5,703 query-evidence-answer triplets.<n>By challenging models to retrieve relevant information from large corpora, FinDER offers a more realistic benchmark for evaluating RAG systems.
arXiv Detail & Related papers (2025-04-22T11:30:13Z) - FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering [18.821122274064116]
We introduce FAMMA, an open-source benchmark for underlinefinunderlineancial underlinemultilingual underlinemultimodal question underlineanswering (QA)<n>Our benchmark aims to evaluate the abilities of large language models (LLMs) in answering complex reasoning questions that require advanced financial knowledge.
arXiv Detail & Related papers (2024-10-06T15:41:26Z) - Financial Knowledge Large Language Model [4.599537455808687]
We introduce IDEA-FinBench, an evaluation benchmark for assessing financial knowledge in large language models (LLMs)
We propose IDEA-FinKER, a framework designed to facilitate the rapid adaptation of general LLMs to the financial domain.
Finally, we present IDEA-FinQA, a financial question-answering system powered by LLMs.
arXiv Detail & Related papers (2024-06-29T08:26:49Z) - FinBen: A Holistic Financial Benchmark for Large Language Models [75.09474986283394]
FinBen is the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks.
FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and three novel open-source evaluation datasets for text summarization, question answering, and stock trading.
arXiv Detail & Related papers (2024-02-20T02:16:16Z) - PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark
for Finance [63.51545277822702]
PIXIU is a comprehensive framework including the first financial large language model (LLMs) based on fine-tuning LLaMA with instruction data.
We propose FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks.
We conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks.
arXiv Detail & Related papers (2023-06-08T14:20:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.