Related papers: Tiny-TSM: Efficiently Training a Lightweight SOTA Time Series Foundation Model

Tiny-TSM: Efficiently Training a Lightweight SOTA Time Series Foundation Model

URL: http://arxiv.org/abs/2511.19272v1
Date: Mon, 24 Nov 2025 16:22:05 GMT
Title: Tiny-TSM: Efficiently Training a Lightweight SOTA Time Series Foundation Model
Authors: Felix Birkel,
Abstract summary: We present Tiny-TSM, a time series foundation model characterized by small scale, economical training, and state-of-the-art performance.<n>It comprises 23M total parameters, trained on a single A100 GPU in less than a week using a new synthetic data generation and data augmentation pipeline.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present Tiny-TSM, a time series foundation model characterized by small scale, economical training, and state-of-the-art performance. It comprises 23M total parameters, trained on a single A100 GPU in less than a week using a new synthetic data generation and data augmentation pipeline (SynthTS). Without any neural architecture search, hyperparameter tuning, or scaling up model size, Tiny-TSM achieves state-of-the-art performance on a wide range of time series benchmark datasets, often outperforming much larger models and even matching the performance of much larger, industrial-scale, likely highly tuned foundation models. Specifically, Tiny-TSM outperforms all other time series foundation models we evaluated on medium- and long-term forecasting tasks under MSE loss, while short-term accuracy is still competitive with state-of-the-art models. We also introduce a causal input normalization scheme that enables time series models to be trained with dense next-token prediction loss, significantly accelerating convergence speed and reducing training time. All experiments were conducted on a single A100 GPU, illustrating the practicality of the proposed approach in a resource-constrained setting.

Related papers

In-Context and Few-Shots Learning for Forecasting Time Series Data based on Large Language Models [0.0]
This paper investigates the performance of using LLM models for time series data prediction.<n>We train LLMs through in-context, zero-shot and few-shot learning and forecasting time series data with OpenAI tt o4-mini and Gemini 2.5 Flash Lite.<n>The findings indicate that TimesFM has the best overall performance with the lowest RMSE value (0.3023) and the competitive inference time (266 seconds)
arXiv Detail & Related papers (2025-12-08T16:52:46Z)
SEMPO: Lightweight Foundation Models for Time Series Forecasting [45.456949943052116]
SEMPO is a lightweight foundation model that requires pretraining on relatively small-scale data, yet exhibits strong general time series forecasting.<n> SEMPO comprises two key modules: 1) energy-aware SpEctral decomposition module, that substantially improves the utilization of pre-training data.<n>Experiments on two large-scale benchmarks covering 16 datasets demonstrate the superior performance of SEMPO in both zero-shot and few-shot forecasting scenarios.
arXiv Detail & Related papers (2025-10-22T15:58:44Z)
SVTime: Small Time Series Forecasting Models Informed by "Physics" of Large Vision Model Forecasters [86.38433605933515]
Time series AI is crucial for analyzing dynamic web content.<n>Given their energy-intensive training, inference, and hardware demands, using large models as a one-fits-all solution raises serious concerns about carbon footprint and sustainability.<n>This paper introduces SVTime, a novel Small model inspired by large Vision model (LVM) forecasters for long-term Time series forecasting (LTSF)
arXiv Detail & Related papers (2025-10-10T18:42:23Z)
Output Scaling: YingLong-Delayed Chain of Thought in a Large Pretrained Time Series Forecasting Model [55.25659103706409]
This framework achieves state-of-the-art performance for our designed foundation model, YingLong.<n>YingLong is a non-causal, bidirectional attention encoder-only transformer trained through masked token recovery.<n>We release four foundation models ranging from 6M to 300M parameters, demonstrating superior results in zero-shot tasks.
arXiv Detail & Related papers (2025-05-20T14:31:06Z)
TSPulse: Dual Space Tiny Pre-Trained Models for Rapid Time-Series Analysis [12.034816114258803]
TSPulse is an ultra-compact time-series pre-trained model with only 1M parameters.<n>It performs strongly across classification, anomaly detection, imputation, and retrieval tasks.<n>Results are achieved with just 1M parameters (10-100X smaller than existing SOTA models)
arXiv Detail & Related papers (2025-05-19T12:18:53Z)
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts [25.503695417712997]
Time-MoE is a scalable and unified architecture designed to pre-train larger, more capable forecasting foundation models.<n>Time-MoE enhances computational efficiency by activating only a subset of networks for each prediction.<n>For the first time, we scaled a time series foundation model up to 2.4 billion parameters, achieving significantly improved forecasting precision.
arXiv Detail & Related papers (2024-09-24T12:42:18Z)
EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods. EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z)
Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM) During pre-training, we curate large-scale datasets with up to 1 billion time points. To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z)
Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series [11.635608108358575]
We introduce Tiny Time Mixers (TTM), a compact model with effective transfer learning capabilities, trained exclusively on public TS datasets. TTM incorporates innovations like adaptive patching, diverse resolution sampling, and resolution prefix tuning to handle pre-training on varied dataset resolutions. It outperforms existing popular benchmarks in zero/few-shot forecasting by (4-40%), while reducing computational requirements significantly.
arXiv Detail & Related papers (2024-01-08T15:21:21Z)
Reusing Pretrained Models by Multi-linear Operators for Efficient Training [65.64075958382034]
Training large models from scratch usually costs a substantial amount of resources. Recent studies such as bert2BERT and LiGO have reused small pretrained models to initialize a large model. We propose a method that linearly correlates each weight of the target model to all the weights of the pretrained model.
arXiv Detail & Related papers (2023-10-16T06:16:47Z)
Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.