Pre-trained Forecasting Models: Strong Zero-Shot Feature Extractors for Time Series Classification
- URL: http://arxiv.org/abs/2510.26777v1
- Date: Thu, 30 Oct 2025 17:55:23 GMT
- Title: Pre-trained Forecasting Models: Strong Zero-Shot Feature Extractors for Time Series Classification
- Authors: Andreas Auer, Daniel Klotz, Sebastinan Böck, Sepp Hochreiter,
- Abstract summary: We show that the best forecasting models achieve classification accuracy that matches or even surpasses that of state-of-the-art models pre-trained specifically for classification.<n>These findings challenge the assumption that task-specific pre-training is necessary, and suggest that learning to forecast may provide a powerful route toward constructing general-purpose time series foundation models.
- Score: 19.714904955821623
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent research on time series foundation models has primarily focused on forecasting, leaving it unclear how generalizable their learned representations are. In this study, we examine whether frozen pre-trained forecasting models can provide effective representations for classification. To this end, we compare different representation extraction strategies and introduce two model-agnostic embedding augmentations. Our experiments show that the best forecasting models achieve classification accuracy that matches or even surpasses that of state-of-the-art models pre-trained specifically for classification. Moreover, we observe a positive correlation between forecasting and classification performance. These findings challenge the assumption that task-specific pre-training is necessary, and suggest that learning to forecast may provide a powerful route toward constructing general-purpose time series foundation models.
Related papers
- Position: Beyond Model-Centric Prediction -- Agentic Time Series Forecasting [49.05788441962762]
We argue for agentic time series forecasting (ATSF), which reframes forecasting as an agentic process composed of perception, planning, action, reflection, and memory.<n>We outline three representative implementation paradigms -- workflow-based design, agentic reinforcement learning, and a hybrid agentic workflow paradigm -- and discuss the opportunities and challenges that arise when shifting from model-centric prediction to agentic forecasting.
arXiv Detail & Related papers (2026-02-02T08:01:11Z) - Modèles de Fondation et Ajustement : Vers une Nouvelle Génération de Modèles pour la Prévision des Séries Temporelles [26.28141834580785]
Foundations models have been developed for zero-shot time series forecasting.<n>These models learn generalizable representations for both point and probabilistic forecasting.<n>We study the effect of fine-tuning after pretraining to enhance their performance on specific datasets.
arXiv Detail & Related papers (2025-11-27T18:19:20Z) - Accuracy Law for the Future of Deep Time Series Forecasting [65.46625911002202]
Time series forecasting inherently faces a non-zero error lower bound due to its partially observable and uncertain nature.<n>This paper focuses on a fundamental question: how to estimate the performance upper bound of deep time series forecasting.<n>Based on rigorous statistical tests of over 2,800 newly trained deep forecasters, we discover a significant exponential relationship between the minimum forecasting error of deep models and the complexity of window-wise series patterns.
arXiv Detail & Related papers (2025-10-03T05:18:47Z) - ChronosX: Adapting Pretrained Time Series Models with Exogenous Variables [30.679739751673655]
This paper introduces a new method to incorporate covariates into pretrained time series forecasting models.<n>Our proposed approach incorporates covariate information into pretrained forecasting models through modular blocks.<n>In evaluations on both synthetic and real datasets, our approach effectively incorporates covariate information into pretrained models, outperforming existing baselines.
arXiv Detail & Related papers (2025-03-15T12:34:19Z) - ReAugment: Model Zoo-Guided RL for Few-Shot Time Series Augmentation and Forecasting [74.00765474305288]
We present a pilot study on using reinforcement learning (RL) for time series data augmentation.<n>Our method, ReAugment, tackles three critical questions: which parts of the training set should be augmented, how the augmentation should be performed, and what advantages RL brings to the process.
arXiv Detail & Related papers (2024-09-10T07:34:19Z) - Forecasting with Deep Learning: Beyond Average of Average of Average Performance [0.393259574660092]
Current practices for evaluating and comparing forecasting models focus on summarising performance into a single score.
We propose a novel framework for evaluating models from multiple perspectives.
We show the advantages of this framework by comparing a state-of-the-art deep learning approach with classical forecasting techniques.
arXiv Detail & Related papers (2024-06-24T12:28:22Z) - Predictive Churn with the Set of Good Models [61.00058053669447]
This paper explores connections between two seemingly unrelated concepts of predictive inconsistency.<n>The first, known as predictive multiplicity, occurs when models that perform similarly produce conflicting predictions for individual samples.<n>The second concept, predictive churn, examines the differences in individual predictions before and after model updates.
arXiv Detail & Related papers (2024-02-12T16:15:25Z) - Counterfactual Explanations for Time Series Forecasting [14.03870816983583]
We formulate the novel problem of counterfactual generation for time series forecasting, and propose an algorithm, called ForecastCF.
ForecastCF solves the problem by applying gradient-based perturbations to the original time series.
Our results show that ForecastCF outperforms the baseline in terms of counterfactual validity and data manifold closeness.
arXiv Detail & Related papers (2023-10-12T08:51:59Z) - Representer Point Selection for Explaining Regularized High-dimensional
Models [105.75758452952357]
We introduce a class of sample-based explanations we term high-dimensional representers.
Our workhorse is a novel representer theorem for general regularized high-dimensional models.
We study the empirical performance of our proposed methods on three real-world binary classification datasets and two recommender system datasets.
arXiv Detail & Related papers (2023-05-31T16:23:58Z) - Pathologies of Pre-trained Language Models in Few-shot Fine-tuning [50.3686606679048]
We show that pre-trained language models with few examples show strong prediction bias across labels.
Although few-shot fine-tuning can mitigate the prediction bias, our analysis shows models gain performance improvement by capturing non-task-related features.
These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior.
arXiv Detail & Related papers (2022-04-17T15:55:18Z) - Feature-weighted Stacking for Nonseasonal Time Series Forecasts: A Case
Study of the COVID-19 Epidemic Curves [0.0]
We investigate ensembling techniques in forecasting and examine their potential for use in nonseasonal time-series.
We propose using late data fusion, using a stacked ensemble of two forecasting models and two meta-features that prove their predictive power during a preliminary forecasting stage.
arXiv Detail & Related papers (2021-08-19T14:44:46Z) - Models, Pixels, and Rewards: Evaluating Design Trade-offs in Visual
Model-Based Reinforcement Learning [109.74041512359476]
We study a number of design decisions for the predictive model in visual MBRL algorithms.
We find that a range of design decisions that are often considered crucial, such as the use of latent spaces, have little effect on task performance.
We show how this phenomenon is related to exploration and how some of the lower-scoring models on standard benchmarks will perform the same as the best-performing models when trained on the same training data.
arXiv Detail & Related papers (2020-12-08T18:03:21Z) - Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words"
Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.