What If TSF: A Benchmark for Reframing Forecasting as Scenario-Guided Multimodal Forecasting
- URL: http://arxiv.org/abs/2601.08509v1
- Date: Tue, 13 Jan 2026 12:47:43 GMT
- Title: What If TSF: A Benchmark for Reframing Forecasting as Scenario-Guided Multimodal Forecasting
- Authors: Jinkwan Jang, Hyunbin Jin, Hyungjin Park, Kyubyung Chae, Taesup Kim,
- Abstract summary: What If TSF (WIT) is a benchmark designed to evaluate whether models can condition their forecasts on contextual text.<n>WIT offers a rigorous testbed for scenario-guided multimodal forecasting.
- Score: 8.593646221015264
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Time series forecasting is critical to real-world decision making, yet most existing approaches remain unimodal and rely on extrapolating historical patterns. While recent progress in large language models (LLMs) highlights the potential for multimodal forecasting, existing benchmarks largely provide retrospective or misaligned raw context, making it unclear whether such models meaningfully leverage textual inputs. In practice, human experts incorporate what-if scenarios with historical evidence, often producing distinct forecasts from the same observations under different scenarios. Inspired by this, we introduce What If TSF (WIT), a multimodal forecasting benchmark designed to evaluate whether models can condition their forecasts on contextual text, especially future scenarios. By providing expert-crafted plausible or counterfactual scenarios, WIT offers a rigorous testbed for scenario-guided multimodal forecasting. The benchmark is available at https://github.com/jinkwan1115/WhatIfTSF.
Related papers
- Position: Beyond Model-Centric Prediction -- Agentic Time Series Forecasting [49.05788441962762]
We argue for agentic time series forecasting (ATSF), which reframes forecasting as an agentic process composed of perception, planning, action, reflection, and memory.<n>We outline three representative implementation paradigms -- workflow-based design, agentic reinforcement learning, and a hybrid agentic workflow paradigm -- and discuss the opportunities and challenges that arise when shifting from model-centric prediction to agentic forecasting.
arXiv Detail & Related papers (2026-02-02T08:01:11Z) - When Does Multimodality Lead to Better Time Series Forecasting? [96.26052272121615]
We investigate whether and under what conditions such multimodal integration consistently yields gains.<n>Our findings reveal that the benefits of multimodality are highly condition-dependent.<n>Our study offers a rigorous, quantitative foundation for understanding when multimodality can be expected to aid forecasting tasks.
arXiv Detail & Related papers (2025-06-20T23:55:56Z) - Realistic Test-Time Adaptation of Vision-Language Models [23.972884634610413]
Vision-Language Models (VLMs) have been widely leveraged to improve predictive performance.<n>Previous works on transductive or test-time adaptation (TTA) often make strong assumptions about the data distribution.<n>Our work challenges these favorable deployment scenarios, and introduces a more realistic evaluation framework.
arXiv Detail & Related papers (2025-01-07T12:17:25Z) - Scenario-Wise Rec: A Multi-Scenario Recommendation Benchmark [65.13288661320364]
We introduce our benchmark, textbfScenario-Wise Rec, which comprises 6 public datasets and 12 benchmark models, along with a training and evaluation pipeline.<n>We aim for this benchmark to offer researchers valuable insights from prior work, enabling the development of novel models.
arXiv Detail & Related papers (2024-12-23T08:15:34Z) - Context is Key: A Benchmark for Forecasting with Essential Textual Information [87.3175915185287]
"Context is Key" (CiK) is a forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context.<n>We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters.<n>We propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark.
arXiv Detail & Related papers (2024-10-24T17:56:08Z) - Forecasting with Deep Learning: Beyond Average of Average of Average Performance [0.393259574660092]
Current practices for evaluating and comparing forecasting models focus on summarising performance into a single score.
We propose a novel framework for evaluating models from multiple perspectives.
We show the advantages of this framework by comparing a state-of-the-art deep learning approach with classical forecasting techniques.
arXiv Detail & Related papers (2024-06-24T12:28:22Z) - HoTPP Benchmark: Are We Good at the Long Horizon Events Forecasting? [1.3654846342364308]
We introduce HoTPP, the first benchmark specifically designed to rigorously evaluate long-horizon predictions.<n>We identify shortcomings in widely used evaluation metrics, propose a theoretically grounded T-mAP metric, and offer efficient implementations of popular models.<n>We analyze the impact of autoregression and intensity-based losses on prediction quality, and outline promising directions for future research.
arXiv Detail & Related papers (2024-06-20T14:09:00Z) - Benchmarking Sequential Visual Input Reasoning and Prediction in
Multimodal Large Language Models [21.438427686724932]
We introduce a novel benchmark that assesses the predictive reasoning capabilities of MLLMs across diverse scenarios.
Our benchmark targets three important domains: abstract pattern reasoning, human activity prediction, and physical interaction prediction.
Empirical experiments confirm the soundness of the proposed benchmark and evaluation methods.
arXiv Detail & Related papers (2023-10-20T13:14:38Z) - Learning Interpretable Deep State Space Model for Probabilistic Time
Series Forecasting [98.57851612518758]
Probabilistic time series forecasting involves estimating the distribution of future based on its history.
We propose a deep state space model for probabilistic time series forecasting whereby the non-linear emission model and transition model are parameterized by networks.
We show in experiments that our model produces accurate and sharp probabilistic forecasts.
arXiv Detail & Related papers (2021-01-31T06:49:33Z) - Forecast with Forecasts: Diversity Matters [9.66075743192747]
In recent years, the idea of using time series features to construct forecast combination model has flourished in the forecasting area.
In this work, we suggest a change of focus from the historical data to the produced forecasts to extract features.
We calculate the diversity of a pool of models based on the corresponding forecasts as a decisive feature and use meta-learning to construct diversity-based forecast combination models.
arXiv Detail & Related papers (2020-12-03T02:14:02Z) - Ambiguity in Sequential Data: Predicting Uncertain Futures with
Recurrent Models [110.82452096672182]
We propose an extension of the Multiple Hypothesis Prediction (MHP) model to handle ambiguous predictions with sequential data.
We also introduce a novel metric for ambiguous problems, which is better suited to account for uncertainties.
arXiv Detail & Related papers (2020-03-10T09:15:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.