Related papers: It's TIME: Towards the Next Generation of Time Series Forecasting Benchmarks

It's TIME: Towards the Next Generation of Time Series Forecasting Benchmarks

URL: http://arxiv.org/abs/2602.12147v1
Date: Thu, 12 Feb 2026 16:31:01 GMT
Title: It's TIME: Towards the Next Generation of Time Series Forecasting Benchmarks
Authors: Zhongzheng Qiao, Sheng Pan, Anni Wang, Viktoriya Zhukova, Yong Liu, Xudong Jiang, Qingsong Wen, Mingsheng Long, Ming Jin, Chenghao Liu,
Abstract summary: Time series foundation models (TSFMs) are revolutionizing the forecasting landscape from specific dataset modeling to generalizable task evaluation.<n>We introduce TIME, a next-generation task-centric benchmark comprising 50 fresh datasets and 98 forecasting tasks.<n>We propose a novel pattern-level evaluation perspective that moves beyond traditional dataset-level evaluations based on static meta labels.
Score: 87.7937890373758
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Time series foundation models (TSFMs) are revolutionizing the forecasting landscape from specific dataset modeling to generalizable task evaluation. However, we contend that existing benchmarks exhibit common limitations in four dimensions: constrained data composition dominated by reused legacy sources, compromised data integrity lacking rigorous quality assurance, misaligned task formulations detached from real-world contexts, and rigid analysis perspectives that obscure generalizable insights. To bridge these gaps, we introduce TIME, a next-generation task-centric benchmark comprising 50 fresh datasets and 98 forecasting tasks, tailored for strict zero-shot TSFM evaluation free from data leakage. Integrating large language models and human expertise, we establish a rigorous human-in-the-loop benchmark construction pipeline to ensure high data integrity and redefine task formulation by aligning forecasting configurations with real-world operational requirements and variate predictability. Furthermore, we propose a novel pattern-level evaluation perspective that moves beyond traditional dataset-level evaluations based on static meta labels. By leveraging structural time series features to characterize intrinsic temporal properties, this approach offers generalizable insights into model capabilities across diverse patterns. We evaluate 12 representative TSFMs and establish a multi-granular leaderboard to facilitate in-depth analysis and visualized inspection. The leaderboard is available at https://huggingface.co/spaces/Real-TSF/TIME-leaderboard.

Related papers

Universal Redundancies in Time Series Foundation Models [3.8551402560229806]
Time Series Foundation Models (TSFMs) leverage extensive pretraining to accurately predict unseen time series during inference.<n>We introduce a set of tools for mechanistic interpretability of TSFMs, including ablations of specific components and direct logit attribution on the residual stream.
arXiv Detail & Related papers (2026-02-02T03:53:46Z)
Time Series Foundation Models: Benchmarking Challenges and Requirements [0.0]
Time Series Foundation Models (TSFMs) represent a new paradigm for time series forecasting.<n> evaluating TSFMs is tricky, as with ever more extensive training sets, it becomes more challenging to ensure integrity benchmarking data.
arXiv Detail & Related papers (2025-10-15T15:15:45Z)
Estimating Time Series Foundation Model Transferability via In-Context Learning [74.65355820906355]
Time series foundation models (TSFMs) offer strong zero-shot forecasting via large-scale pre-training.<n>Fine-tuning remains critical for boosting performance in domains with limited public data.<n>We introduce TimeTic, a transferability estimation framework that recasts model selection as an in-context-learning problem.
arXiv Detail & Related papers (2025-09-28T07:07:13Z)
Time-RA: Towards Time Series Reasoning for Anomaly with LLM Feedback [55.284574165467525]
Time-series Reasoning for Anomaly (Time-RA) transforms classical time series anomaly detection into a generative, reasoning-intensive task.<n>Also, we introduce the first real-world multimodal benchmark dataset, RATs40K, explicitly annotated for anomaly reasoning.
arXiv Detail & Related papers (2025-07-20T18:02:50Z)
TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster [14.512119661418522]
We present TS-RAG, a retrieval-augmented generation framework for time series forecasting.<n>Specifically, TS-RAG leverages pre-trained time series encoders to retrieve semantically relevant segments from a dedicated knowledge base.<n>We show that TS-RAG achieves state-of-the-art zero-shot forecasting performance, outperforming the existing TSFMs by up to 6.84% across diverse domains.
arXiv Detail & Related papers (2025-03-06T16:48:48Z)
Not All Data are Good Labels: On the Self-supervised Labeling for Time Series Forecasting [37.189362258417624]
This paper explores a novel self-supervised approach to re-label time series datasets by inherently constructing candidate datasets.<n>During the optimization of a simple reconstruction network, intermediates are used as pseudo labels in a self-supervised paradigm.<n>Experiments on eleven real-world datasets demonstrate that SCAM consistently improves the performance of various backbone models.
arXiv Detail & Related papers (2025-02-20T16:29:37Z)
GIFT-Eval: A Benchmark For General Time Series Forecasting Model Evaluation [90.53485251837235]
Time series foundation models excel in zero-shot forecasting, handling diverse tasks without explicit training. GIFT-Eval is a pioneering benchmark aimed at promoting evaluation across diverse datasets. GIFT-Eval encompasses 23 datasets over 144,000 time series and 177 million data points.
arXiv Detail & Related papers (2024-10-14T11:29:38Z)
Learning Pattern-Specific Experts for Time Series Forecasting Under Patch-level Distribution Shift [51.01356105618118]
Time series often exhibit complex non-uniform distribution with varying patterns across segments, such as season, operating condition, or semantic meaning.<n>Existing approaches, which typically train a single model to capture all these diverse patterns, often struggle with the pattern drifts between patches.<n>We propose TFPS, a novel architecture that leverages pattern-specific experts for more accurate and adaptable time series forecasting.
arXiv Detail & Related papers (2024-10-13T13:35:29Z)
DAM: Towards A Foundation Model for Time Series Forecasting [0.8231118867997028]
We propose a neural model that takes randomly sampled histories and outputs an adjustable basis composition as a continuous function of time. It involves three key components: (1) a flexible approach for using randomly sampled histories from a long-tail distribution; (2) a transformer backbone that is trained on these actively sampled histories to produce, as representational output; and (3) the basis coefficients of a continuous function of time.
arXiv Detail & Related papers (2024-07-25T08:48:07Z)
GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models [56.63218531256961]
We introduce GenBench, a benchmarking suite specifically tailored for evaluating the efficacy of Genomic Foundation Models. GenBench offers a modular and expandable framework that encapsulates a variety of state-of-the-art methodologies. We provide a nuanced analysis of the interplay between model architecture and dataset characteristics on task-specific performance.
arXiv Detail & Related papers (2024-06-01T08:01:05Z)
Unified Long-Term Time-Series Forecasting Benchmark [0.6526824510982802]
We present a comprehensive dataset designed explicitly for long-term time-series forecasting. We incorporate a collection of datasets obtained from diverse, dynamic systems and real-life records. To determine the most effective model in diverse scenarios, we conduct an extensive benchmarking analysis using classical and state-of-the-art models. Our findings reveal intriguing performance comparisons among these models, highlighting the dataset-dependent nature of model effectiveness.
arXiv Detail & Related papers (2023-09-27T18:59:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.