OATS: Online Data Augmentation for Time Series Foundation Models
- URL: http://arxiv.org/abs/2601.19040v1
- Date: Mon, 26 Jan 2026 23:51:03 GMT
- Title: OATS: Online Data Augmentation for Time Series Foundation Models
- Authors: Junwei Deng, Chang Xu, Jiaqi W. Ma, Ming Jin, Chenghao Liu, Jiang Bian,
- Abstract summary: Time Series Foundation Models (TSFMs) are a powerful paradigm for time analysis and are often enhanced by synthetic data augmentation to improve the training data quality.<n>We propose OATS (Online Data Augmentation for Time Series Foundation Models), a principled strategy that generates synthetic data tailored to different training steps.
- Score: 49.1394215208561
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Time Series Foundation Models (TSFMs) are a powerful paradigm for time series analysis and are often enhanced by synthetic data augmentation to improve the training data quality. Existing augmentation methods, however, typically rely on heuristics and static paradigms. Motivated by dynamic data optimization, which shows that the contribution of samples varies across training stages, we propose OATS (Online Data Augmentation for Time Series Foundation Models), a principled strategy that generates synthetic data tailored to different training steps. OATS leverages valuable training samples as principled guiding signals and dynamically generates high-quality synthetic data conditioned on them. We further design a diffusion-based framework to produce realistic time series and introduce an explore-exploit mechanism to balance efficiency and effectiveness. Experiments on TSFMs demonstrate that OATS consistently outperforms regular training and yields substantial performance gains over static data augmentation baselines across six validation datasets and two TSFM architectures. The code is available at the link https://github.com/microsoft/TimeCraft.
Related papers
- Meta-learning to Address Data Shift in Time Series Classification [0.0]
Traditional deep learning (TDL) models perform well when training and test data share the same distribution.<n>The dynamic nature of real-world data renders TDL models prone to rapid performance degradation, requiring costly relabeling and inefficient retraining.<n>Here, we systematically compare TDL with fine-tuning and optimization-based meta-learning algorithms to assess their ability to address data shift.
arXiv Detail & Related papers (2026-01-13T22:38:43Z) - Robust Tabular Foundation Models [0.7539295827164078]
A key finding is that TFMs can be pretrained entirely on synthetic datasets.<n>We introduce an optimality gap measure, given by the difference between TFM performance and the best achievable performance.<n>These results highlight a promising new dataset for targeted adversarial training and fine-tuning of TFMs using synthetic data alone.
arXiv Detail & Related papers (2025-12-02T23:40:39Z) - Estimating Time Series Foundation Model Transferability via In-Context Learning [74.65355820906355]
Time series foundation models (TSFMs) offer strong zero-shot forecasting via large-scale pre-training.<n>Fine-tuning remains critical for boosting performance in domains with limited public data.<n>We introduce TimeTic, a transferability estimation framework that recasts model selection as an in-context-learning problem.
arXiv Detail & Related papers (2025-09-28T07:07:13Z) - Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning [44.53583316198435]
Supervised Fine-Tuning (SFT) Large Language Models rely on high-quality training data.<n>We introduce Middo, a self-evolving Model-informed dynamic data optimization framework.<n>We show that Middo consistently enhances the quality of seed data and boosts LLM's performance with improving accuracy by 7.15% on average.
arXiv Detail & Related papers (2025-08-29T12:47:27Z) - A Time-Series Data Augmentation Model through Diffusion and Transformer Integration [0.6437284704257459]
Deep neural networks typically require large volumes of data for training.<n>We propose a simple and effective method that combines the Diffusion and Transformer models.<n>Using the performance improvement of the model after applying augmented data as a benchmark, this approach shows its capability to produce high-quality augmented data.
arXiv Detail & Related papers (2025-05-01T09:40:45Z) - Scaling Laws of Synthetic Data for Language Models [125.41600201811417]
We introduce SynthLLM, a scalable framework that transforms pre-training corpora into diverse, high-quality synthetic datasets.<n>Our approach achieves this by automatically extracting and recombining high-level concepts across multiple documents using a graph algorithm.
arXiv Detail & Related papers (2025-03-25T11:07:12Z) - Multi-Armed Bandit Approach for Optimizing Training on Synthetic Data [7.603659241572307]
We propose a novel UCB-based training procedure combined with a dynamic usability metric.<n>Our proposed metric integrates low-level and high-level information from synthetic images and their corresponding real and synthetic datasets.<n>We show that our metric is an effective way to rank synthetic images based on their usability.
arXiv Detail & Related papers (2024-12-06T23:36:36Z) - Tackling Data Heterogeneity in Federated Time Series Forecasting [61.021413959988216]
Time series forecasting plays a critical role in various real-world applications, including energy consumption prediction, disease transmission monitoring, and weather forecasting.
Most existing methods rely on a centralized training paradigm, where large amounts of data are collected from distributed devices to a central cloud server.
We propose a novel framework, Fed-TREND, to address data heterogeneity by generating informative synthetic data as auxiliary knowledge carriers.
arXiv Detail & Related papers (2024-11-24T04:56:45Z) - Data Augmentation for Traffic Classification [54.92823760790628]
Data Augmentation (DA) is a technique widely adopted in Computer Vision (CV) and Natural Language Processing (NLP) tasks.
DA has struggled to gain traction in networking contexts, particularly in Traffic Classification (TC) tasks.
arXiv Detail & Related papers (2024-01-19T15:25:09Z) - Regularizing Generative Adversarial Networks under Limited Data [88.57330330305535]
This work proposes a regularization approach for training robust GAN models on limited data.
We show a connection between the regularized loss and an f-divergence called LeCam-divergence, which we find is more robust under limited training data.
arXiv Detail & Related papers (2021-04-07T17:59:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.