Related papers: FinSight: Towards Real-World Financial Deep Research

FinSight: Towards Real-World Financial Deep Research

URL: http://arxiv.org/abs/2510.16844v1
Date: Sun, 19 Oct 2025 14:05:35 GMT
Title: FinSight: Towards Real-World Financial Deep Research
Authors: Jiajie Jin, Yuyao Zhang, Yimeng Xu, Hongjin Qian, Yutao Zhu, Zhicheng Dou,
Abstract summary: FinSight is a novel framework for producing high-quality, multimodal financial reports.<n>To ensure professional-grade visualization, we propose an Iterative Vision-Enhanced Mechanism.<n>A two-stage Writing Framework expands concise Chain-of-Analysis segments into coherent, citation-aware, and multimodal reports.
Score: 68.31086471310773
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Generating professional financial reports is a labor-intensive and intellectually demanding process that current AI systems struggle to fully automate. To address this challenge, we introduce FinSight (Financial InSight), a novel multi agent framework for producing high-quality, multimodal financial reports. The foundation of FinSight is the Code Agent with Variable Memory (CAVM) architecture, which unifies external data, designed tools, and agents into a programmable variable space, enabling flexible data collection, analysis and report generation through executable code. To ensure professional-grade visualization, we propose an Iterative Vision-Enhanced Mechanism that progressively refines raw visual outputs into polished financial charts. Furthermore, a two stage Writing Framework expands concise Chain-of-Analysis segments into coherent, citation-aware, and multimodal reports, ensuring both analytical depth and structural consistency. Experiments on various company and industry-level tasks demonstrate that FinSight significantly outperforms all baselines, including leading deep research systems in terms of factual accuracy, analytical depth, and presentation quality, demonstrating a clear path toward generating reports that approach human-expert quality.

Related papers

UniFinEval: Towards Unified Evaluation of Financial Multimodal Models across Text, Images and Videos [22.530796761115766]
We propose UniFinEval, the first unified multimodal benchmark for high-information-density financial environments.<n>UniFinEval systematically constructs five core financial scenarios grounded in real-world financial systems.<n> Gemini-3-pro-preview achieves the best overall performance, yet still exhibits a substantial gap compared to financial experts.
arXiv Detail & Related papers (2026-01-09T10:15:32Z)
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding [61.36285696607487]
Document understanding is critical for applications from financial analysis to scientific discovery.<n>Current approaches, whether OCR-based pipelines feeding Large Language Models (LLMs) or native Multimodal LLMs (MLLMs) face key limitations.<n>Retrieval-Augmented Generation (RAG) helps ground models in external data, but documents' multimodal nature, combining text, tables, charts, and layout, demands a more advanced paradigm: Multimodal RAG.
arXiv Detail & Related papers (2025-10-17T02:33:16Z)
FinMR: A Knowledge-Intensive Multimodal Benchmark for Advanced Financial Reasoning [10.985136487771364]
FinMR is a knowledge-intensive multimodal dataset designed to evaluate expert-level financial reasoning capabilities at a professional analyst's standard.<n>It comprises over 3,200 meticulously curated and expertly annotated question-answer pairs across 15 diverse financial topics.<n>FinMR establishes itself as an essential benchmark tool for assessing and advancing multimodal financial reasoning toward professional analyst-level competence.
arXiv Detail & Related papers (2025-10-09T06:49:55Z)
FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering [57.43420753842626]
FinLFQA is a benchmark designed to evaluate the ability of Large Language Models to generate long-form answers to complex financial questions.<n>We provide an automatic evaluation framework covering both answer quality and attribution quality.
arXiv Detail & Related papers (2025-10-07T20:06:15Z)
Enhancing Financial RAG with Agentic AI and Multi-HyDE: A Novel Approach to Knowledge Retrieval and Hallucination Reduction [0.5814806132299305]
We introduce a framework for financial Retrieval Augmented Generation (RAG)<n>RAG generates multiple, nonequivalent queries to boost the effectiveness and coverage of retrieval from large, structured financial corpora.<n>Our pipeline is optimized for token efficiency and multi-step financial reasoning.
arXiv Detail & Related papers (2025-09-19T19:24:30Z)
FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation [65.04104723843264]
We present FinDER, an expert-generated dataset tailored for Retrieval-Augmented Generation (RAG) in finance.<n>FinDER focuses on annotating search-relevant evidence by domain experts, offering 5,703 query-evidence-answer triplets.<n>By challenging models to retrieve relevant information from large corpora, FinDER offers a more realistic benchmark for evaluating RAG systems.
arXiv Detail & Related papers (2025-04-22T11:30:13Z)
RAG-IT: Retrieval-Augmented Instruction Tuning for Automated Financial Analysis [1.2891210250935148]
RAG-IT (Retrieval-Augmented Instruction Tuning) is a novel framework designed to automate the generation of earnings report analyses.<n>Our approach integrates retrieval augmentation with instruction-based fine-tuning to enhance factual accuracy, contextual relevance, and domain adaptability.
arXiv Detail & Related papers (2024-12-11T08:09:42Z)
FinDABench: Benchmarking Financial Data Analysis Ability of Large Language Models [26.99936434072108]
textttFinDABench is a benchmark designed to evaluate the financial data analysis capabilities of Large Language Models. textttFinDABench aims to provide a measure for in-depth analysis of LLM abilities.
arXiv Detail & Related papers (2024-01-01T15:26:23Z)
FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets [9.714447724811842]
This paper introduces a distinctive approach anchored in the Instruction Tuning paradigm for open-source large language models. We capitalize on the interoperability of open-source models, ensuring a seamless and transparent integration. The paper presents a benchmarking scheme designed for end-to-end training and testing, employing a cost-effective progression.
arXiv Detail & Related papers (2023-10-07T12:52:58Z)
PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance [63.51545277822702]
PIXIU is a comprehensive framework including the first financial large language model (LLMs) based on fine-tuning LLaMA with instruction data. We propose FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks. We conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks.
arXiv Detail & Related papers (2023-06-08T14:20:29Z)
FinQA: A Dataset of Numerical Reasoning over Financial Data [52.7249610894623]
We focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents. We propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts. The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge.
arXiv Detail & Related papers (2021-09-01T00:08:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.