Related papers: SEMPO: Lightweight Foundation Models for Time Series Forecasting

SEMPO: Lightweight Foundation Models for Time Series Forecasting

URL: http://arxiv.org/abs/2510.19710v1
Date: Wed, 22 Oct 2025 15:58:44 GMT
Title: SEMPO: Lightweight Foundation Models for Time Series Forecasting
Authors: Hui He, Kun Yi, Yuanchi Ma, Qi Zhang, Zhendong Niu, Guansong Pang,
Abstract summary: SEMPO is a lightweight foundation model that requires pretraining on relatively small-scale data, yet exhibits strong general time series forecasting.<n> SEMPO comprises two key modules: 1) energy-aware SpEctral decomposition module, that substantially improves the utilization of pre-training data.<n>Experiments on two large-scale benchmarks covering 16 datasets demonstrate the superior performance of SEMPO in both zero-shot and few-shot forecasting scenarios.
Score: 45.456949943052116
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The recent boom of large pre-trained models witnesses remarkable success in developing foundation models (FMs) for time series forecasting. Despite impressive performance across diverse downstream forecasting tasks, existing time series FMs possess massive network architectures and require substantial pre-training on large-scale datasets, which significantly hinders their deployment in resource-constrained environments. In response to this growing tension between versatility and affordability, we propose SEMPO, a novel lightweight foundation model that requires pretraining on relatively small-scale data, yet exhibits strong general time series forecasting. Concretely, SEMPO comprises two key modules: 1) energy-aware SpEctral decomposition module, that substantially improves the utilization of pre-training data by modeling not only the high-energy frequency signals but also the low-energy yet informative frequency signals that are ignored in current methods; and 2) Mixture-of-PrOmpts enabled Transformer, that learns heterogeneous temporal patterns through small dataset-specific prompts and adaptively routes time series tokens to prompt-based experts for parameter-efficient model adaptation across different datasets and domains. Equipped with these modules, SEMPO significantly reduces both pre-training data scale and model size, while achieving strong generalization. Extensive experiments on two large-scale benchmarks covering 16 datasets demonstrate the superior performance of SEMPO in both zero-shot and few-shot forecasting scenarios compared with state-of-the-art methods. Code and data are available at https://github.com/mala-lab/SEMPO.

Related papers

Enhancing few-shot time series forecasting with LLM-guided diffusion [12.286204074670236]
Time series forecasting in specialized domains is often constrained by limited data availability.<n>We propose LTSM-DIFF, a novel learning framework that integrates the expressive power of large language models with the generative capability of diffusion models.<n>Our work establishes a new paradigm for time series analysis under data scarcity.
arXiv Detail & Related papers (2026-01-19T06:30:05Z)
In-Context and Few-Shots Learning for Forecasting Time Series Data based on Large Language Models [0.0]
This paper investigates the performance of using LLM models for time series data prediction.<n>We train LLMs through in-context, zero-shot and few-shot learning and forecasting time series data with OpenAI tt o4-mini and Gemini 2.5 Flash Lite.<n>The findings indicate that TimesFM has the best overall performance with the lowest RMSE value (0.3023) and the competitive inference time (266 seconds)
arXiv Detail & Related papers (2025-12-08T16:52:46Z)
Time Series Foundation Models for Process Model Forecasting [8.339024524110828]
Process Model Forecasting aims to predict how the control-flow structure of a process evolves over time.<n>Machine learning and deep learning models provide only modest gains over statistical baselines.<n>We investigate Time Series Foundation Models (TSFMs) as an alternative for PMF.
arXiv Detail & Related papers (2025-12-08T15:08:50Z)
A Unified Frequency Domain Decomposition Framework for Interpretable and Robust Time Series Forecasting [81.73338008264115]
Current approaches for time series forecasting, whether in the time or frequency domain, predominantly use deep learning models based on linear layers or transformers.<n>We propose FIRE, a unified frequency domain decomposition framework that provides a mathematical abstraction for diverse types of time series.<n>Fire consistently outperforms state-of-the-art models on long-term forecasting benchmarks.
arXiv Detail & Related papers (2025-10-11T09:59:25Z)
MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models [11.374098795890738]
MoFE-Time integrates time and frequency domain features within a Mixture of Experts (MoE) network.<n>MoFE-Time has achieved new state-of-the-art performance, reducing MSE and MAE by 6.95% and 6.02% compared to the representative methods Time-MoE.<n>Our method achieves outstanding results on this dataset, underscoring the effectiveness of the MoFE-Time model in practical commercial applications.
arXiv Detail & Related papers (2025-07-09T03:00:56Z)
Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization [74.3339999119713]
We develop a wavelet-based tokenizer that allows models to learn complex representations directly in the space of time-localized frequencies.<n>Our method first scales and decomposes the input time series, then thresholds and quantizes the wavelet coefficients, and finally pre-trains an autoregressive model to forecast coefficients for the forecast horizon.
arXiv Detail & Related papers (2024-12-06T18:22:59Z)
Tackling Data Heterogeneity in Federated Time Series Forecasting [61.021413959988216]
Time series forecasting plays a critical role in various real-world applications, including energy consumption prediction, disease transmission monitoring, and weather forecasting. Most existing methods rely on a centralized training paradigm, where large amounts of data are collected from distributed devices to a central cloud server. We propose a novel framework, Fed-TREND, to address data heterogeneity by generating informative synthetic data as auxiliary knowledge carriers.
arXiv Detail & Related papers (2024-11-24T04:56:45Z)
Generative Pretrained Hierarchical Transformer for Time Series Forecasting [3.739587363053192]
We propose a novel generative pretrained hierarchical transformer architecture for forecasting, named textbfGPHT. We conduct sufficient experiments on eight datasets with mainstream self-supervised pretraining models and supervised models. The results demonstrated that GPHT surpasses the baseline models across various fine-tuning and zero/few-shot learning settings in the traditional long-term forecasting task.
arXiv Detail & Related papers (2024-02-26T11:54:54Z)
Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai) Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains. Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z)
Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM) During pre-training, we curate large-scale datasets with up to 1 billion time points. To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z)
Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series [11.635608108358575]
We introduce Tiny Time Mixers (TTM), a compact model with effective transfer learning capabilities, trained exclusively on public TS datasets. TTM incorporates innovations like adaptive patching, diverse resolution sampling, and resolution prefix tuning to handle pre-training on varied dataset resolutions. It outperforms existing popular benchmarks in zero/few-shot forecasting by (4-40%), while reducing computational requirements significantly.
arXiv Detail & Related papers (2024-01-08T15:21:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.