Related papers: FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging

URL: http://arxiv.org/abs/2508.04625v1
Date: Wed, 06 Aug 2025 16:51:09 GMT
Title: FinMMR: Make Financial Numerical Reasoning More Multimodal, Comprehensive, and Challenging
Authors: Zichen Tang, Haihong E, Jiacheng Liu, Zhongjun Yang, Rongjin Li, Zihua Rong, Haoyang He, Zhuodi Hao, Xinyang Hu, Kun Ji, Ziyan Ma, Mengyuan Ji, Jun Zhang, Chenghao Ma, Qianhe Zheng, Yang Liu, Yiling Huang, Xinyi Hu, Qing Huang, Zijian Xie, Shiyao Peng,
Abstract summary: FinMMR is a novel bilingual benchmark tailored to evaluate the reasoning capabilities of multimodal large language models (MLLMs) in financial numerical reasoning tasks.<n>FinMMR comprises 4.3K questions and 8.7K images spanning 14 categories, including tables, bar charts, and ownership structure charts.
Score: 12.897569424944107
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present FinMMR, a novel bilingual multimodal benchmark tailored to evaluate the reasoning capabilities of multimodal large language models (MLLMs) in financial numerical reasoning tasks. Compared to existing benchmarks, our work introduces three significant advancements. (1) Multimodality: We meticulously transform existing financial reasoning benchmarks, and construct novel questions from the latest Chinese financial research reports. FinMMR comprises 4.3K questions and 8.7K images spanning 14 categories, including tables, bar charts, and ownership structure charts. (2) Comprehensiveness: FinMMR encompasses 14 financial subdomains, including corporate finance, banking, and industry analysis, significantly exceeding existing benchmarks in financial domain knowledge breadth. (3) Challenge: Models are required to perform multi-step precise numerical reasoning by integrating financial knowledge with the understanding of complex financial images and text. The best-performing MLLM achieves only 53.0% accuracy on Hard problems. We believe that FinMMR will drive advancements in enhancing the reasoning capabilities of MLLMs in real-world scenarios.

Related papers

MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation [89.73542209537148]
MultiFinBen is the first multilingual and multimodal benchmark tailored to the global financial domain.<n>We introduce two novel tasks, including EnglishOCR and SpanishOCR, the first OCR-embedded financial QA tasks.<n>We propose a dynamic, difficulty-aware selection mechanism and curate a compact, balanced benchmark.
arXiv Detail & Related papers (2025-06-16T22:01:49Z)
CFBenchmark-MM: Chinese Financial Assistant Benchmark for Multimodal Large Language Model [21.702901343472558]
Multimodal Large Language Models (MLLMs) have rapidly evolved with the growth of Large Language Models (LLMs)<n>In this paper, we introduce CFBenchmark-MM, a Chinese multimodal financial benchmark with over 9,000 image-question pairs featuring tables, histogram charts, line charts, pie charts, and structural diagrams.<n>We develop a staged evaluation system to assess MLLMs in handling multimodal information by providing different visual content step by step.
arXiv Detail & Related papers (2025-06-16T02:52:44Z)
M$^3$FinMeeting: A Multilingual, Multi-Sector, and Multi-Task Financial Meeting Understanding Evaluation Dataset [18.752133381125564]
$texttM$3$FinMeeting$ is a multilingual, multi-sector, and multi-task dataset designed for financial meeting understanding.<n>First, it supports English, Chinese, and Japanese, enhancing comprehension of financial discussions in diverse linguistic contexts.<n>Second, it encompasses various industry sectors defined by the Global Industry Classification Standard (GICS)<n>Third, it includes three tasks: summarization, question-answer (QA) pair extraction, and question answering, facilitating a more realistic and comprehensive evaluation of understanding.
arXiv Detail & Related papers (2025-06-03T06:41:09Z)
FinMaster: A Holistic Benchmark for Mastering Full-Pipeline Financial Workflows with LLMs [15.230256296815565]
FinMaster is a benchmark designed to assess the capabilities of large language models (LLMs) in financial literacy, accounting, auditing, and consulting.<n>FinMaster comprises three main modules: FinSim, FinSuite, and FinEval.<n>Experiments reveal critical capability gaps in financial reasoning, with accuracy dropping from over 90% on basic tasks to merely 37% on complex scenarios.
arXiv Detail & Related papers (2025-05-18T11:47:55Z)
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications [88.96861155804935]
We introduce textitOpen-FinLLMs, the first open-source multimodal financial LLMs.<n>FinLLaMA is pre-trained on a comprehensive 52-billion-token corpus; FinLLaMA-Instruct, fine-tuned with 573K financial instructions; and FinLLaVA, enhanced with 1.43M multimodal tuning pairs.<n>We evaluate Open-FinLLMs across 14 financial tasks, 30 datasets, and 4 multimodal tasks in zero-shot, few-shot, and supervised fine-tuning settings.
arXiv Detail & Related papers (2024-08-20T16:15:28Z)
CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models [61.324062412648075]
CFinBench is an evaluation benchmark for assessing the financial knowledge of large language models (LLMs) under Chinese context. It comprises 99,100 questions spanning 43 second-level categories with 3 question types: single-choice, multiple-choice and judgment. The results show that GPT4 and some Chinese-oriented models lead the benchmark, with the highest average accuracy being 60.16%.
arXiv Detail & Related papers (2024-07-02T14:34:36Z)
FinBen: A Holistic Financial Benchmark for Large Language Models [75.09474986283394]
FinBen is the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks. FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and three novel open-source evaluation datasets for text summarization, question answering, and stock trading.
arXiv Detail & Related papers (2024-02-20T02:16:16Z)
PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance [63.51545277822702]
PIXIU is a comprehensive framework including the first financial large language model (LLMs) based on fine-tuning LLaMA with instruction data. We propose FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks. We conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks.
arXiv Detail & Related papers (2023-06-08T14:20:29Z)
FinQA: A Dataset of Numerical Reasoning over Financial Data [52.7249610894623]
We focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents. We propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts. The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge.
arXiv Detail & Related papers (2021-09-01T00:08:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.