Related papers: FinTMMBench: Benchmarking Temporal-Aware Multi-Modal RAG in Finance

FinTMMBench: Benchmarking Temporal-Aware Multi-Modal RAG in Finance

URL: http://arxiv.org/abs/2503.05185v1
Date: Fri, 07 Mar 2025 07:13:59 GMT
Title: FinTMMBench: Benchmarking Temporal-Aware Multi-Modal RAG in Finance
Authors: Fengbin Zhu, Junfeng Li, Liangming Pan, Wenjie Wang, Fuli Feng, Chao Wang, Huanbo Luan, Tat-Seng Chua,
Abstract summary: FinTMMBench is the first comprehensive benchmark for evaluating temporal-aware multi-modal Retrieval-Augmented Generation systems in finance.<n>Built from heterologous data of NASDAQ 100 companies, FinTMMBench offers three significant advantages.
Score: 79.78247299859656
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Finance decision-making often relies on in-depth data analysis across various data sources, including financial tables, news articles, stock prices, etc. In this work, we introduce FinTMMBench, the first comprehensive benchmark for evaluating temporal-aware multi-modal Retrieval-Augmented Generation (RAG) systems in finance. Built from heterologous data of NASDAQ 100 companies, FinTMMBench offers three significant advantages. 1) Multi-modal Corpus: It encompasses a hybrid of financial tables, news articles, daily stock prices, and visual technical charts as the corpus. 2) Temporal-aware Questions: Each question requires the retrieval and interpretation of its relevant data over a specific time period, including daily, weekly, monthly, quarterly, and annual periods. 3) Diverse Financial Analysis Tasks: The questions involve 10 different tasks, including information extraction, trend analysis, sentiment analysis and event detection, etc. We further propose a novel TMMHybridRAG method, which first leverages LLMs to convert data from other modalities (e.g., tabular, visual and time-series data) into textual format and then incorporates temporal information in each node when constructing graphs and dense indexes. Its effectiveness has been validated in extensive experiments, but notable gaps remain, highlighting the challenges presented by our FinTMMBench.

Related papers

FinDER: Financial Dataset for Question Answering and Evaluating Retrieval-Augmented Generation [63.55583665003167]
We present FinDER, an expert-generated dataset tailored for Retrieval-Augmented Generation (RAG) in finance.<n>FinDER focuses on annotating search-relevant evidence by domain experts, offering 5,703 query-evidence-answer triplets.<n>By challenging models to retrieve relevant information from large corpora, FinDER offers a more realistic benchmark for evaluating RAG systems.
arXiv Detail & Related papers (2025-04-22T11:30:13Z)
MTBench: A Multimodal Time Series Benchmark for Temporal Reasoning and Question Answering [21.064096256892686]
Multimodal time-series datasets fall short in evaluating cross-modal reasoning and complex question answering. We introduce Multimodal Time Series Benchmark (MTBench), a benchmark to evaluate large language models (LLMs) on time series and text understanding. We evaluate state-of-the-art LLMs on MTbench, analyzing their effectiveness in modeling the complex relationships between news narratives and temporal patterns.
arXiv Detail & Related papers (2025-03-21T05:04:53Z)
FinTSB: A Comprehensive and Practical Benchmark for Financial Time Series Forecasting [58.70072722290475]
Financial time series (FinTS) record the behavior of human-brain-augmented decision-making.<n>FinTSB is a comprehensive and practical benchmark for financial time series forecasting.
arXiv Detail & Related papers (2025-02-26T05:19:16Z)
Multimodal Stock Price Prediction [0.0]
It has become increasingly critical to carefully integrate diverse data sources with machine learning for accurate stock price prediction.<n>This paper explores a multimodal machine learning approach for stock price prediction by combining data from diverse sources, including traditional financial metrics, tweets, and news articles.
arXiv Detail & Related papers (2025-01-23T16:38:46Z)
AlphaFin: Benchmarking Financial Analysis with Retrieval-Augmented Stock-Chain Framework [48.3060010653088]
We release AlphaFin datasets, combining traditional research datasets, real-time financial data, and handwritten chain-of-thought (CoT) data. We then use AlphaFin datasets to benchmark a state-of-the-art method, called Stock-Chain, for effectively tackling the financial analysis task.
arXiv Detail & Related papers (2024-03-19T09:45:33Z)
FinBen: A Holistic Financial Benchmark for Large Language Models [75.09474986283394]
FinBen is the first extensive open-source evaluation benchmark, including 36 datasets spanning 24 financial tasks. FinBen offers several key innovations: a broader range of tasks and datasets, the first evaluation of stock trading, novel agent and Retrieval-Augmented Generation (RAG) evaluation, and three novel open-source evaluation datasets for text summarization, question answering, and stock trading.
arXiv Detail & Related papers (2024-02-20T02:16:16Z)
FinDABench: Benchmarking Financial Data Analysis Ability of Large Language Models [26.99936434072108]
textttFinDABench is a benchmark designed to evaluate the financial data analysis capabilities of Large Language Models. textttFinDABench aims to provide a measure for in-depth analysis of LLM abilities.
arXiv Detail & Related papers (2024-01-01T15:26:23Z)
FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis [15.20897845057384]
FinVis-GPT is a novel multimodal large language model (LLM) specifically designed for financial chart analysis. The proposed FinVis-GPT serves as a pioneering effort in utilizing multimodal LLMs in the finance domain.
arXiv Detail & Related papers (2023-07-31T07:44:15Z)
PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance [63.51545277822702]
PIXIU is a comprehensive framework including the first financial large language model (LLMs) based on fine-tuning LLaMA with instruction data. We propose FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks. We conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks.
arXiv Detail & Related papers (2023-06-08T14:20:29Z)
StockEmotions: Discover Investor Emotions for Financial Sentiment Analysis and Multivariate Time Series [5.892675412951627]
This paper introduces StockEmotions, a new dataset for detecting emotions in the stock market. It consists of 10,000 English comments collected from StockTwits, a financial social media platform. Unlike existing financial sentiment datasets, StockEmotions presents granular features such as investor sentiment classes, fine-grained emotions, emojis, and time series data.
arXiv Detail & Related papers (2023-01-23T05:32:42Z)
FinQA: A Dataset of Numerical Reasoning over Financial Data [52.7249610894623]
We focus on answering deep questions over financial data, aiming to automate the analysis of a large corpus of financial documents. We propose a new large-scale dataset, FinQA, with Question-Answering pairs over Financial reports, written by financial experts. The results demonstrate that popular, large, pre-trained models fall far short of expert humans in acquiring finance knowledge.
arXiv Detail & Related papers (2021-09-01T00:08:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.