Related papers: FinMR: A Knowledge-Intensive Multimodal Benchmark for Advanced Financial Reasoning

FinMR: A Knowledge-Intensive Multimodal Benchmark for Advanced Financial Reasoning

URL: http://arxiv.org/abs/2510.07852v1
Date: Thu, 09 Oct 2025 06:49:55 GMT
Title: FinMR: A Knowledge-Intensive Multimodal Benchmark for Advanced Financial Reasoning
Authors: Shuangyan Deng, Haizhou Peng, Jiachen Xu, Rui Mao, Ciprian Doru Giurcăneanu, Jiamou Liu,
Abstract summary: FinMR is a knowledge-intensive multimodal dataset designed to evaluate expert-level financial reasoning capabilities at a professional analyst's standard.<n>It comprises over 3,200 meticulously curated and expertly annotated question-answer pairs across 15 diverse financial topics.<n>FinMR establishes itself as an essential benchmark tool for assessing and advancing multimodal financial reasoning toward professional analyst-level competence.
Score: 10.985136487771364
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal Large Language Models (MLLMs) have made substantial progress in recent years. However, their rigorous evaluation within specialized domains like finance is hindered by the absence of datasets characterized by professional-level knowledge intensity, detailed annotations, and advanced reasoning complexity. To address this critical gap, we introduce FinMR, a high-quality, knowledge-intensive multimodal dataset explicitly designed to evaluate expert-level financial reasoning capabilities at a professional analyst's standard. FinMR comprises over 3,200 meticulously curated and expertly annotated question-answer pairs across 15 diverse financial topics, ensuring broad domain diversity and integrating sophisticated mathematical reasoning, advanced financial knowledge, and nuanced visual interpretation tasks across multiple image types. Through comprehensive benchmarking with leading closed-source and open-source MLLMs, we highlight significant performance disparities between these models and professional financial analysts, uncovering key areas for model advancement, such as precise image analysis, accurate application of complex financial formulas, and deeper contextual financial understanding. By providing richly varied visual content and thorough explanatory annotations, FinMR establishes itself as an essential benchmark tool for assessing and advancing multimodal financial reasoning toward professional analyst-level competence.

Related papers

The CLEF-2026 FinMMEval Lab: Multilingual and Multimodal Evaluation of Financial AI Systems [54.12165004393043]
FinMMEval 2026 offers three interconnected tasks that span financial understanding, reasoning, and decision-making.<n>The lab aims to promote the development of robust, transparent, and globally inclusive financial AI systems.
arXiv Detail & Related papers (2026-02-11T14:14:06Z)
FinMTM: A Multi-Turn Multimodal Benchmark for Financial Reasoning and Agent Evaluation [15.654001393123403]
FinMTM is a multi-turn multimodal benchmark that expands diversity along both data and task dimensions.<n>On the data side, we curate and annotate 11,133 bilingual (Chinese and English) financial QA pairs grounded in financial visuals.<n>On the task side, FinMTM covers single- and multiple-choice questions, multi-turn open-ended dialogues, and agent-based tasks.
arXiv Detail & Related papers (2026-02-03T05:38:24Z)
FinSight: Towards Real-World Financial Deep Research [68.31086471310773]
FinSight is a novel framework for producing high-quality, multimodal financial reports.<n>To ensure professional-grade visualization, we propose an Iterative Vision-Enhanced Mechanism.<n>A two-stage Writing Framework expands concise Chain-of-Analysis segments into coherent, citation-aware, and multimodal reports.
arXiv Detail & Related papers (2025-10-19T14:05:35Z)
FinLFQA: Evaluating Attributed Text Generation of LLMs in Financial Long-Form Question Answering [57.43420753842626]
FinLFQA is a benchmark designed to evaluate the ability of Large Language Models to generate long-form answers to complex financial questions.<n>We provide an automatic evaluation framework covering both answer quality and attribution quality.
arXiv Detail & Related papers (2025-10-07T20:06:15Z)
Exploring Large Language Models for Financial Applications: Techniques, Performance, and Challenges with FinMA [0.0]
FinMA, a model created within the PIXIU framework, is evaluated for its performance in specialized financial tasks.<n>Findings indicate that FinMA performs well in sentiment analysis and classification, but faces notable challenges in tasks involving numerical reasoning, entity recognition, and summarization.
arXiv Detail & Related papers (2025-10-02T11:19:59Z)
FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging [12.897569424944107]
FinMMR is a novel bilingual benchmark tailored to evaluate the reasoning capabilities of multimodal large language models (MLLMs) in financial numerical reasoning tasks.<n>FinMMR comprises 4.3K questions and 8.7K images spanning 14 categories, including tables, bar charts, and ownership structure charts.
arXiv Detail & Related papers (2025-08-06T16:51:09Z)
CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model [21.702901343472558]
Multimodal Large Language Models (MLLMs) have rapidly evolved with the growth of Large Language Models (LLMs)<n>In this paper, we introduce CFBenchmark-MM, a Chinese multimodal financial benchmark with over 9,000 image-question pairs featuring tables, histogram charts, line charts, pie charts, and structural diagrams.<n>We develop a staged evaluation system to assess MLLMs in handling multimodal information by providing different visual content step by step.
arXiv Detail & Related papers (2025-06-16T02:52:44Z)
FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation [65.04104723843264]
We present FinDER, an expert-generated dataset tailored for Retrieval-Augmented Generation (RAG) in finance.<n>FinDER focuses on annotating search-relevant evidence by domain experts, offering 5,703 query-evidence-answer triplets.<n>By challenging models to retrieve relevant information from large corpora, FinDER offers a more realistic benchmark for evaluating RAG systems.
arXiv Detail & Related papers (2025-04-22T11:30:13Z)
MME-Finance: A Multimodal Finance Benchmark for Expert-level Understanding and Reasoning [42.80085792749683]
We propose MME-Finance, an open-ended and practical usage-oriented Visual Question Answering (VQA) benchmark. The characteristics of our benchmark are finance and expertise, which include constructing charts that reflect the actual usage needs of users. In addition, we propose a Chinese version, which helps compare performance of MLLMs under a Chinese context.
arXiv Detail & Related papers (2024-11-05T18:59:51Z)
FinTeamExperts: Role Specialized MOEs For Financial Analysis [17.145985064776273]
We present the FinTeamExperts, a role-specialized LLM framework structured as a Mixture of Experts (MOEs) for financial analysis. The framework simulates a collaborative team setting by training each model to specialize in distinct roles: Macro Analysts, Micro analysts, and Quantitative Analysts. We achieve this by training three 8-billion parameter models on different corpus, each dedicated to excelling in specific finance-related roles.
arXiv Detail & Related papers (2024-10-28T00:40:55Z)
FinBen: A Holistic Financial Benchmark for Large Language Models [75.09474986283394]
FinBen is the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks. FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and three novel open-source evaluation datasets for text summarization, question answering, and stock trading.
arXiv Detail & Related papers (2024-02-20T02:16:16Z)
PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance [63.51545277822702]
PIXIU is a comprehensive framework including the first financial large language model (LLMs) based on fine-tuning LLaMA with instruction data. We propose FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks. We conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks.
arXiv Detail & Related papers (2023-06-08T14:20:29Z)
FinQA: A Dataset of Numerical Reasoning over Financial Data [52.7249610894623]
We focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents. We propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts. The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge.
arXiv Detail & Related papers (2021-09-01T00:08:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.