TempoPFN: Synthetic Pre-training of Linear RNNs for Zero-shot Time Series Forecasting
- URL: http://arxiv.org/abs/2510.25502v2
- Date: Fri, 31 Oct 2025 17:01:54 GMT
- Title: TempoPFN: Synthetic Pre-training of Linear RNNs for Zero-shot Time Series Forecasting
- Authors: Vladyslav Moroshan, Julien Siems, Arber Zela, Timur Carstensen, Frank Hutter,
- Abstract summary: We present TempoPFN, a time series foundation model based on linear Recurrent Neural Networks (RNNs) pre-trained exclusively on synthetic data.<n>The model uses a GatedDeltaProduct architecture with state-weaving for fully parallelizable training across sequence lengths.
- Score: 42.2854432715079
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models for zero-shot time series forecasting face challenges in efficient long-horizon prediction and reproducibility, with existing synthetic-only approaches underperforming on challenging benchmarks. This paper presents TempoPFN, a univariate time series foundation model based on linear Recurrent Neural Networks (RNNs) pre-trained exclusively on synthetic data. The model uses a GatedDeltaProduct architecture with state-weaving for fully parallelizable training across sequence lengths, eliminating the need for windowing or summarization techniques while maintaining robust temporal state-tracking. Our comprehensive synthetic data pipeline unifies diverse generators, including stochastic differential equations, Gaussian processes, and audio synthesis, with novel augmentations. In zero-shot evaluations on the Gift-Eval benchmark, TempoPFN achieves top-tier competitive performance, outperforming all existing synthetic-only approaches and surpassing the vast majority of models trained on real-world data, while being more efficient than existing baselines by leveraging fully parallelizable training and inference. We open-source our complete data generation pipeline and training code, providing a reproducible foundation for future research.
Related papers
- OATS: Online Data Augmentation for Time Series Foundation Models [49.1394215208561]
Time Series Foundation Models (TSFMs) are a powerful paradigm for time analysis and are often enhanced by synthetic data augmentation to improve the training data quality.<n>We propose OATS (Online Data Augmentation for Time Series Foundation Models), a principled strategy that generates synthetic data tailored to different training steps.
arXiv Detail & Related papers (2026-01-26T23:51:03Z) - TIMED: Adversarial and Autoregressive Refinement of Diffusion-Based Time Series Generation [0.31498833540989407]
TIMED is a unified generative framework that captures global structure via a forward-reverse diffusion process.<n>To further align the real and synthetic distributions in feature space, TIMED incorporates a Maximum Mean Discrepancy (MMD) loss.<n>We show that TIMED generates more realistic and temporally coherent sequences than state-of-the-art generative models.
arXiv Detail & Related papers (2025-09-23T23:05:40Z) - SynDelay: A Synthetic Dataset for Delivery Delay Prediction [50.56729406793283]
We present SynDelay, a synthetic dataset designed for delivery delay prediction.<n>It is publicly available through the Supply Chain Data Hub, an open initiative promoting dataset sharing and benchmarking in supply chain AI.
arXiv Detail & Related papers (2025-08-30T21:54:37Z) - Scaling Laws of Synthetic Data for Language Models [125.41600201811417]
We introduce SynthLLM, a scalable framework that transforms pre-training corpora into diverse, high-quality synthetic datasets.<n>Our approach achieves this by automatically extracting and recombining high-level concepts across multiple documents using a graph algorithm.
arXiv Detail & Related papers (2025-03-25T11:07:12Z) - Tackling Data Heterogeneity in Federated Time Series Forecasting [61.021413959988216]
Time series forecasting plays a critical role in various real-world applications, including energy consumption prediction, disease transmission monitoring, and weather forecasting.
Most existing methods rely on a centralized training paradigm, where large amounts of data are collected from distributed devices to a central cloud server.
We propose a novel framework, Fed-TREND, to address data heterogeneity by generating informative synthetic data as auxiliary knowledge carriers.
arXiv Detail & Related papers (2024-11-24T04:56:45Z) - Recurrent Neural Goodness-of-Fit Test for Time Series [8.22915954499148]
Time series data are crucial across diverse domains such as finance and healthcare.<n>Traditional evaluation metrics fall short due to the temporal dependencies and potential high dimensionality of the features.<n>We propose the REcurrent NeurAL (RENAL) Goodness-of-Fit test, a novel and statistically rigorous framework for evaluating generative time series models.
arXiv Detail & Related papers (2024-10-17T19:32:25Z) - Online Data Augmentation for Forecasting with Deep Learning [0.33554367023486936]
This work introduces an online data augmentation framework that generates synthetic samples during the training of neural networks.<n>We maintain a balanced representation between real and synthetic data throughout the training process.<n>Experiments suggest that online data augmentation leads to better forecasting performance compared to offline data augmentation or no augmentation approaches.
arXiv Detail & Related papers (2024-04-25T17:16:13Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Are Synthetic Time-series Data Really not as Good as Real Data? [29.852306720544224]
Time-series data presents limitations stemming from data quality issues, bias and vulnerabilities, and generalization problem.
We introduce InfoBoost -- a highly versatile cross-domain data synthesizing framework with time series representation learning capability.
We have developed a method based on synthetic data that enables model training without the need for real data, surpassing the performance of models trained with real data.
arXiv Detail & Related papers (2024-02-01T13:59:04Z) - Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEs [50.25683648762602]
We introduce Koopman VAE, a new generative framework that is based on a novel design for the model prior.
Inspired by Koopman theory, we represent the latent conditional prior dynamics using a linear map.
KoVAE outperforms state-of-the-art GAN and VAE methods across several challenging synthetic and real-world time series generation benchmarks.
arXiv Detail & Related papers (2023-10-04T07:14:43Z) - Online Evolutionary Neural Architecture Search for Multivariate
Non-Stationary Time Series Forecasting [72.89994745876086]
This work presents the Online Neuro-Evolution-based Neural Architecture Search (ONE-NAS) algorithm.
ONE-NAS is a novel neural architecture search method capable of automatically designing and dynamically training recurrent neural networks (RNNs) for online forecasting tasks.
Results demonstrate that ONE-NAS outperforms traditional statistical time series forecasting methods.
arXiv Detail & Related papers (2023-02-20T22:25:47Z) - STAN: Synthetic Network Traffic Generation with Generative Neural Models [10.54843182184416]
This paper presents STAN (Synthetic network Traffic generation with Autoregressive Neural models), a tool to generate realistic synthetic network traffic datasets.
Our novel neural architecture captures both temporal dependencies and dependence between attributes at any given time.
We evaluate the performance of STAN in terms of the quality of data generated, by training it on both a simulated dataset and a real network traffic data set.
arXiv Detail & Related papers (2020-09-27T04:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.