Related papers: Scaling Transformers for Time Series Forecasting: Do Pretrained Large Models Outperform Small-Scale Alternatives?

Scaling Transformers for Time Series Forecasting: Do Pretrained Large Models Outperform Small-Scale Alternatives?

URL: http://arxiv.org/abs/2507.02907v1
Date: Tue, 24 Jun 2025 11:54:10 GMT
Title: Scaling Transformers for Time Series Forecasting: Do Pretrained Large Models Outperform Small-Scale Alternatives?
Authors: Sanjay Chakraborty, Ibrahim Delibasoglu, Fredrik Heintz,
Abstract summary: This work examines whether pre-trained large-scale time series models (LSTSMs) can outperform traditional non-pretrained small-scale transformers in forecasting tasks.<n>We analyze state-of-the-art (SOTA) pre-trained universal time series models (e.g., Moirai, TimeGPT) alongside conventional transformers.<n>Our findings reveal the strengths and limitations of pre-trained LSTSMs, providing insights into their suitability for time series tasks.
Score: 4.075971633195745
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large pre-trained models have demonstrated remarkable capabilities across domains, but their effectiveness in time series forecasting remains understudied. This work empirically examines whether pre-trained large-scale time series models (LSTSMs) trained on diverse datasets can outperform traditional non-pretrained small-scale transformers in forecasting tasks. We analyze state-of-the-art (SOTA) pre-trained universal time series models (e.g., Moirai, TimeGPT) alongside conventional transformers, evaluating accuracy, computational efficiency, and interpretability across multiple benchmarks. Our findings reveal the strengths and limitations of pre-trained LSTSMs, providing insights into their suitability for time series tasks compared to task-specific small-scale architectures. The results highlight scenarios where pretraining offers advantages and where simpler models remain competitive.

Related papers

Fine-Tuning Pre-trained Large Time Series Models for Prediction of Wind Turbine SCADA Data [6.296946118570559]
This paper explores the application of a pre-trained large time series model, Timer, in the prediction of SCADA data collected from wind turbines.<n>The model was fine-tuned on datasets sourced from two wind farms, which exhibited differing characteristics.
arXiv Detail & Related papers (2024-11-30T09:00:24Z)
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts [25.503695417712997]
Time-MoE is a scalable and unified architecture designed to pre-train larger, more capable forecasting foundation models.<n>Time-MoE enhances computational efficiency by activating only a subset of networks for each prediction.<n>For the first time, we scaled a time series foundation model up to 2.4 billion parameters, achieving significantly improved forecasting precision.
arXiv Detail & Related papers (2024-09-24T12:42:18Z)
ReAugment: Model Zoo-Guided RL for Few-Shot Time Series Augmentation and Forecasting [74.00765474305288]
We present a pilot study on using reinforcement learning (RL) for time series data augmentation.<n>Our method, ReAugment, tackles three critical questions: which parts of the training set should be augmented, how the augmentation should be performed, and what advantages RL brings to the process.
arXiv Detail & Related papers (2024-09-10T07:34:19Z)
LTSM-Bundle: A Toolbox and Benchmark on Large Language Models for Time Series Forecasting [69.33802286580786]
We introduce LTSM-Bundle, a comprehensive toolbox, and benchmark for training LTSMs.<n>It modularized and benchmarked LTSMs from multiple dimensions, encompassing prompting strategies, tokenization approaches, base model selection, data quantity, and dataset diversity.<n> Empirical results demonstrate that this combination achieves superior zero-shot and few-shot performances compared to state-of-the-art LTSMs and traditional TSF methods.
arXiv Detail & Related papers (2024-06-20T07:09:19Z)
Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai) Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains. Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z)
Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM) During pre-training, we curate large-scale datasets with up to 1 billion time points. To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z)
Understanding the Role of Textual Prompts in LLM for Time Series Forecasting: an Adapter View [21.710722062737577]
In the burgeoning domain of Large Language Models (LLMs), there is a growing interest in applying LLM to time series forecasting. This study aims to understand how and why the integration of textual prompts into LLM can effectively improve the prediction accuracy of time series.
arXiv Detail & Related papers (2023-11-24T16:32:47Z)
Large Pre-trained time series models for cross-domain Time series analysis tasks [20.228846068418765]
Large Pre-trained Time-series Models (LPTM) is a novel method of adaptive segmentation that automatically identifies optimal dataset-specific segmentation strategy during pre-training.<n> LPTM achieves superior forecasting and time-series classification results taking up to 40% less data and 50% less training time compared to state-of-art baselines.
arXiv Detail & Related papers (2023-11-19T20:16:16Z)
TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series [57.4208255711412]
Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS) We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks.
arXiv Detail & Related papers (2023-10-02T16:45:19Z)
Examining the Effect of Pre-training on Time Series Classification [21.38211396933795]
This study investigates the impact of pre-training followed by fine-tuning on the fine-tuning process. We conducted a thorough examination of 150 classification datasets. We find that pre-training can only help improve the optimization process for models that fit the data poorly. Adding more pre-training data does not improve generalization, but it can strengthen the advantage of pre-training on the original data volume.
arXiv Detail & Related papers (2023-09-11T06:26:57Z)
Self-Supervised Pretraining Improves Self-Supervised Pretraining [83.1423204498361]
Self-supervised pretraining requires expensive and lengthy computation, large amounts of data, and is sensitive to data augmentation. This paper explores Hierarchical PreTraining (HPT), which decreases convergence time and improves accuracy by initializing the pretraining process with an existing pretrained model. We show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or amount of pretraining data.
arXiv Detail & Related papers (2021-03-23T17:37:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.