MTBench: A Multimodal Time Series Benchmark for Temporal Reasoning and Question Answering
- URL: http://arxiv.org/abs/2503.16858v1
- Date: Fri, 21 Mar 2025 05:04:53 GMT
- Title: MTBench: A Multimodal Time Series Benchmark for Temporal Reasoning and Question Answering
- Authors: Jialin Chen, Aosong Feng, Ziyu Zhao, Juan Garza, Gaukhar Nurbek, Cheng Qin, Ali Maatouk, Leandros Tassiulas, Yifeng Gao, Rex Ying,
- Abstract summary: Multimodal time-series datasets fall short in evaluating cross-modal reasoning and complex question answering.<n>We introduce Multimodal Time Series Benchmark (MTBench), a benchmark to evaluate large language models (LLMs) on time series and text understanding.<n>We evaluate state-of-the-art LLMs on MTbench, analyzing their effectiveness in modeling the complex relationships between news narratives and temporal patterns.
- Score: 21.064096256892686
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Understanding the relationship between textual news and time-series evolution is a critical yet under-explored challenge in applied data science. While multimodal learning has gained traction, existing multimodal time-series datasets fall short in evaluating cross-modal reasoning and complex question answering, which are essential for capturing complex interactions between narrative information and temporal patterns. To bridge this gap, we introduce Multimodal Time Series Benchmark (MTBench), a large-scale benchmark designed to evaluate large language models (LLMs) on time series and text understanding across financial and weather domains. MTbench comprises paired time series and textual data, including financial news with corresponding stock price movements and weather reports aligned with historical temperature records. Unlike existing benchmarks that focus on isolated modalities, MTbench provides a comprehensive testbed for models to jointly reason over structured numerical trends and unstructured textual narratives. The richness of MTbench enables formulation of diverse tasks that require a deep understanding of both text and time-series data, including time-series forecasting, semantic and technical trend analysis, and news-driven question answering (QA). These tasks target the model's ability to capture temporal dependencies, extract key insights from textual context, and integrate cross-modal information. We evaluate state-of-the-art LLMs on MTbench, analyzing their effectiveness in modeling the complex relationships between news narratives and temporal patterns. Our findings reveal significant challenges in current models, including difficulties in capturing long-term dependencies, interpreting causality in financial and weather trends, and effectively fusing multimodal information.
Related papers
- Multi-modal Time Series Analysis: A Tutorial and Survey [36.93906365779472]
Multi-modal time series analysis has emerged as a prominent research area in data mining.<n>However, effective analysis of multi-modal time series is hindered by data heterogeneity, modality gap, misalignment, and inherent noise.<n>Recent advancements in multi-modal time series methods have exploited the multi-modal context via cross-modal interactions.
arXiv Detail & Related papers (2025-03-17T20:30:02Z) - Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data [22.274663165215237]
Time-series analysis is critical for a wide range of fields such as healthcare, finance, transportation, and energy.<n>Current time-series models are limited in their ability to perform reasoning that involves both time-series and their textual content.<n>Chat-TS integrates time-series tokens into LLMs' vocabulary, enhancing its reasoning ability over both modalities.
arXiv Detail & Related papers (2025-03-13T21:05:11Z) - TimesBERT: A BERT-Style Foundation Model for Time Series Understanding [72.64824086839631]
GPT-style models have been positioned as foundation models for time series forecasting.
BERT-style architecture has not been fully unlocked for time series understanding.
We design TimesBERT to learn generic representations of time series.
Our model is pre-trained on 260 billion time points across diverse domains.
arXiv Detail & Related papers (2025-02-28T17:14:44Z) - Time-MQA: Time Series Multi-Task Question Answering with Context Enhancement [55.2439260314328]
Time Series Multi-Task Question Answering (Time-MQA) is a unified framework that enables natural language queries across multiple time series tasks.<n>Central to Time-MQA is the TSQA dataset, a large-scale dataset containing $sim $200k question-answer pairs.
arXiv Detail & Related papers (2025-02-26T13:47:13Z) - Language in the Flow of Time: Time-Series-Paired Texts Weaved into a Unified Temporal Narrative [65.84249211767921]
Texts as Time Series (TaTS) considers the time-series-paired texts to be auxiliary variables of the time series.<n>TaTS can be plugged into any existing numerical-only time series models and enable them to handle time series data with paired texts effectively.
arXiv Detail & Related papers (2025-02-13T03:43:27Z) - TempoGPT: Enhancing Time Series Reasoning via Quantizing Embedding [13.996105878417204]
We propose a multi-modal time series data construction approach and a multi-modal time series language model (TLM), TempoGPT.<n>We construct multi-modal data for complex reasoning tasks by analyzing the variable-system relationships within a white-box system.<n>Extensive experiments demonstrate that TempoGPT accurately perceives temporal information, logically infers conclusions, and achieves state-of-the-art in the constructed complex time series reasoning tasks.
arXiv Detail & Related papers (2025-01-13T13:47:05Z) - Text2Freq: Learning Series Patterns from Text via Frequency Domain [8.922661807801227]
Text2Freq is a cross-modality model that integrates text and time series data via the frequency domain.
Our experiments on paired datasets of real-world stock prices and synthetic texts show that Text2Freq achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-11-01T16:11:02Z) - Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.<n>We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.<n>We propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding [57.62275091656578]
We refer to the complex events composed of many news articles over an extended period as Temporal Complex Event (TCE)
This paper proposes a novel approach using Large Language Models (LLMs) to systematically extract and analyze the event chain within TCE.
arXiv Detail & Related papers (2024-06-04T16:42:17Z) - Modality-aware Transformer for Financial Time series Forecasting [3.401797102198429]
We introduce a novel multimodal transformer-based model named the textitModality-aware Transformer.
Our model excels in exploring the power of both categorical text and numerical timeseries to forecast the target time series effectively.
Our experiments on financial datasets demonstrate that Modality-aware Transformer outperforms existing methods.
arXiv Detail & Related papers (2023-10-02T14:22:41Z) - Continual Multimodal Knowledge Graph Construction [62.77243705682985]
Current Multimodal Knowledge Graph Construction (MKGC) models struggle with the real-world dynamism of continuously emerging entities and relations.
This study introduces benchmarks aimed at fostering the development of the continual MKGC domain.
We introduce MSPT framework, designed to surmount the shortcomings of existing MKGC approaches during multimedia data processing.
arXiv Detail & Related papers (2023-05-15T14:58:28Z) - Multi-scale Attention Flow for Probabilistic Time Series Forecasting [68.20798558048678]
We propose a novel non-autoregressive deep learning model, called Multi-scale Attention Normalizing Flow(MANF)
Our model avoids the influence of cumulative error and does not increase the time complexity.
Our model achieves state-of-the-art performance on many popular multivariate datasets.
arXiv Detail & Related papers (2022-05-16T07:53:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.