Related papers: Understanding Financial Reasoning in AI: A Multimodal Benchmark and Error Learning Approach

Understanding Financial Reasoning in AI: A Multimodal Benchmark and Error Learning Approach

URL: http://arxiv.org/abs/2506.06282v1
Date: Tue, 22 Apr 2025 07:25:03 GMT
Title: Understanding Financial Reasoning in AI: A Multimodal Benchmark and Error Learning Approach
Authors: Shuangyan Deng, Haizhou Peng, Jiachen Xu, Chunhou Liu, Ciprian Doru Giurcuaneanu, Jiamou Liu,
Abstract summary: This paper introduces a new benchmark designed to evaluate how well AI models - especially large language and multimodal models - reason in finance-specific contexts.<n>We propose an error-aware learning framework that leverages historical model mistakes and feedback to guide inference, without requiring fine-tuning.<n>The results highlight persistent challenges in visual understanding and mathematical logic, while also demonstrating the promise of self-reflective reasoning in financial AI systems.
Score: 6.911426601915051
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Effective financial reasoning demands not only textual understanding but also the ability to interpret complex visual data such as charts, tables, and trend graphs. This paper introduces a new benchmark designed to evaluate how well AI models - especially large language and multimodal models - reason in finance-specific contexts. Covering 3,200 expert-level question-answer pairs across 15 core financial topics, the benchmark integrates both textual and visual modalities to reflect authentic analytical challenges in finance. To address limitations in current reasoning approaches, we propose an error-aware learning framework that leverages historical model mistakes and feedback to guide inference, without requiring fine-tuning. Our experiments across state-of-the-art models show that multimodal inputs significantly enhance performance and that incorporating error feedback leads to consistent and measurable improvements. The results highlight persistent challenges in visual understanding and mathematical logic, while also demonstrating the promise of self-reflective reasoning in financial AI systems. Our code and data can be found at https://anonymous/FinMR/CodeData.

Related papers

FinEval-KR: A Financial Domain Evaluation Framework for Large Language Models' Knowledge and Reasoning [18.68776736676411]
FinEval-KR is a novel evaluation framework for quantifying large language models' knowledge and reasoning abilities.<n>Inspired by cognitive science, we propose a cognitive score to analyze capabilities in reasoning tasks across different cognitive levels.<n>Our experimental results reveal that LLM reasoning ability and higher-order cognitive ability are the core factors influencing reasoning accuracy.
arXiv Detail & Related papers (2025-06-18T06:21:50Z)
CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model [21.702901343472558]
Multimodal Large Language Models (MLLMs) have rapidly evolved with the growth of Large Language Models (LLMs)<n>In this paper, we introduce CFBenchmark-MM, a Chinese multimodal financial benchmark with over 9,000 image-question pairs featuring tables, histogram charts, line charts, pie charts, and structural diagrams.<n>We develop a staged evaluation system to assess MLLMs in handling multimodal information by providing different visual content step by step.
arXiv Detail & Related papers (2025-06-16T02:52:44Z)
Reasoning or Overthinking: Evaluating Large Language Models on Financial Sentiment Analysis [1.3812010983144802]
We evaluate how various large language models (LLMs) align with human-labeled sentiment in a financial context.<n>Our findings suggest that reasoning, either through prompting or inherent model design, does not improve performance on this task.<n>Surprisingly, the most accurate and human-aligned combination of model and method was GPT-4o without any Chain-of-Thought (CoT) prompting.
arXiv Detail & Related papers (2025-06-05T02:47:23Z)
FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation [63.55583665003167]
We present FinDER, an expert-generated dataset tailored for Retrieval-Augmented Generation (RAG) in finance.<n>FinDER focuses on annotating search-relevant evidence by domain experts, offering 5,703 query-evidence-answer triplets.<n>By challenging models to retrieve relevant information from large corpora, FinDER offers a more realistic benchmark for evaluating RAG systems.
arXiv Detail & Related papers (2025-04-22T11:30:13Z)
Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1) [66.51642638034822]
Reasoning is central to human intelligence, enabling structured problem-solving across diverse tasks.<n>Recent advances in large language models (LLMs) have greatly enhanced their reasoning abilities in arithmetic, commonsense, and symbolic domains.<n>This paper offers a concise yet insightful overview of reasoning techniques in both textual and multimodal LLMs.
arXiv Detail & Related papers (2025-04-04T04:04:56Z)
Explaining the Unexplainable: A Systematic Review of Explainable AI in Finance [0.0]
This paper offers a thorough overview of the changing scene of XAI applications in finance.<n>We find topic clusters, significant research, and most often used explainability strategies used in financial industries.
arXiv Detail & Related papers (2025-03-07T22:36:44Z)
Fino1: On the Transferability of Reasoning-Enhanced LLMs and Reinforcement Learning to Finance [35.617409883103335]
FinReason is the first financial reasoning benchmark covering multi-table analysis, long-context reasoning, and equation-based tasks.<n>We introduce FinCoT, the first open high-fidelity CoT corpus for finance, distilled from seven QA datasets.<n>We develop Fin-o1, the first open financial reasoning models trained via supervised fine-tuning and GRPO-based RL.
arXiv Detail & Related papers (2025-02-12T05:13:04Z)
Causality can systematically address the monsters under the bench(marks) [64.36592889550431]
Benchmarks are plagued by various biases, artifacts, or leakage.<n>Models may behave unreliably due to poorly explored failure modes.<n> causality offers an ideal framework to systematically address these challenges.
arXiv Detail & Related papers (2025-02-07T17:01:37Z)
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework [48.3060010653088]
We release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data. We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task.
arXiv Detail & Related papers (2024-03-19T09:45:33Z)
Incorporating Pre-trained Model Prompting in Multimodal Stock Volume Movement Prediction [22.949484374773967]
We propose the Prompt-based MUltimodal Stock volumE prediction model (ProMUSE) to process text and time series modalities. We use pre-trained language models for better comprehension of financial news. We also propose a novel cross-modality contrastive alignment while reserving the unimodal heads beside the fusion head to mitigate this problem.
arXiv Detail & Related papers (2023-09-11T16:47:01Z)
Causal Reasoning Meets Visual Representation Learning: A Prospective Study [117.08431221482638]
Lack of interpretability, robustness, and out-of-distribution generalization are becoming the challenges of the existing visual models. Inspired by the strong inference ability of human-level agents, recent years have witnessed great effort in developing causal reasoning paradigms. This paper aims to provide a comprehensive overview of this emerging field, attract attention, encourage discussions, bring to the forefront the urgency of developing novel causal reasoning methods.
arXiv Detail & Related papers (2022-04-26T02:22:28Z)
FinQA: A Dataset of Numerical Reasoning over Financial Data [52.7249610894623]
We focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents. We propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts. The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge.
arXiv Detail & Related papers (2021-09-01T00:08:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.