Empowering Time Series Analysis with Synthetic Data: A Survey and Outlook in the Era of Foundation Models
- URL: http://arxiv.org/abs/2503.11411v1
- Date: Fri, 14 Mar 2025 13:53:46 GMT
- Title: Empowering Time Series Analysis with Synthetic Data: A Survey and Outlook in the Era of Foundation Models
- Authors: Xu Liu, Taha Aksu, Juncheng Liu, Qingsong Wen, Yuxuan Liang, Caiming Xiong, Silvio Savarese, Doyen Sahoo, Junnan Li, Chenghao Liu,
- Abstract summary: Time series analysis is crucial for understanding dynamics of complex systems.<n>Recent advances in foundation models have led to task-agnostic Time Series Foundation Models (TSFMs) and Large Language Model-based Time Series Models (TSLLMs)<n>Their success depends on large, diverse, and high-quality datasets, which are challenging to build due to regulatory, diversity, quality, and quantity constraints.<n>This survey provides a comprehensive review of synthetic data for TSFMs and TSLLMs, analyzing data generation strategies, their role in model pretraining, fine-tuning, and evaluation, and identifying future research directions.
- Score: 104.17057231661371
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Time series analysis is crucial for understanding dynamics of complex systems. Recent advances in foundation models have led to task-agnostic Time Series Foundation Models (TSFMs) and Large Language Model-based Time Series Models (TSLLMs), enabling generalized learning and integrating contextual information. However, their success depends on large, diverse, and high-quality datasets, which are challenging to build due to regulatory, diversity, quality, and quantity constraints. Synthetic data emerge as a viable solution, addressing these challenges by offering scalable, unbiased, and high-quality alternatives. This survey provides a comprehensive review of synthetic data for TSFMs and TSLLMs, analyzing data generation strategies, their role in model pretraining, fine-tuning, and evaluation, and identifying future research directions.
Related papers
- Mitigating Data Scarcity in Time Series Analysis: A Foundation Model with Series-Symbol Data Generation [34.04850897522787]
Foundation models for time series analysis (TSA) have attracted significant attention.<n> challenges such as data scarcity and data imbalance continue to hinder their development.<n>We introduce a series-symbol (S2) dual-modulity data generation mechanism, enabling the unrestricted creation of high-quality time series data.
arXiv Detail & Related papers (2025-02-21T13:43:24Z) - Harnessing Vision Models for Time Series Analysis: A Survey [72.09716244582684]
This survey discusses the advantages of vision models over LLMs in time series analysis.<n>It provides a comprehensive and in-depth overview of the existing methods, with dual views of detailed taxonomy.<n>We address the challenges in the pre- and post-processing steps involved in this framework.
arXiv Detail & Related papers (2025-02-13T00:42:11Z) - Recent Trends in Modelling the Continuous Time Series using Deep Learning: A Survey [0.18434042562191813]
Continuous-time series is essential for different modern application areas, e.g. healthcare, automobile, energy, finance, Internet of things (IoT)
This paper has described the general problem domain of time series and reviewed the challenges of modelling the continuous time series.
arXiv Detail & Related papers (2024-09-13T14:19:44Z) - Deep Time Series Models: A Comprehensive Survey and Benchmark [74.28364194333447]
Time series data is of great significance in real-world scenarios.
Recent years have witnessed remarkable breakthroughs in the time series community.
We release Time Series Library (TSLib) as a fair benchmark of deep time series models for diverse analysis tasks.
arXiv Detail & Related papers (2024-07-18T08:31:55Z) - LTSM-Bundle: A Toolbox and Benchmark on Large Language Models for Time Series Forecasting [69.33802286580786]
We introduce LTSM-Bundle, a comprehensive toolbox, and benchmark for training LTSMs.<n>It modularized and benchmarked LTSMs from multiple dimensions, encompassing prompting strategies, tokenization approaches, base model selection, data quantity, and dataset diversity.<n> Empirical results demonstrate that this combination achieves superior zero-shot and few-shot performances compared to state-of-the-art LTSMs and traditional TSF methods.
arXiv Detail & Related papers (2024-06-20T07:09:19Z) - A Survey on Diffusion Models for Time Series and Spatio-Temporal Data [92.1255811066468]
We review the use of diffusion models in time series and S-temporal data, categorizing them by model, task type, data modality, and practical application domain.
We categorize diffusion models into unconditioned and conditioned types discuss time series and S-temporal data separately.
Our survey covers their application extensively in various fields including healthcare, recommendation, climate, energy, audio, and transportation.
arXiv Detail & Related papers (2024-04-29T17:19:40Z) - Review of Data-centric Time Series Analysis from Sample, Feature, and Period [37.33135447969283]
A good time-series dataset is advantageous for the model's accuracy, robustness, and convergence.
The emergence of data-centric AI represents a shift in the landscape from model refinement to prioritizing data quality.
We systematically review different data-centric methods in time series analysis, covering a wide range of research topics.
arXiv Detail & Related papers (2024-04-24T00:34:44Z) - Comprehensive Exploration of Synthetic Data Generation: A Survey [4.485401662312072]
This work surveys 417 Synthetic Data Generation models over the last decade.
The findings reveal increased model performance and complexity, with neural network-based approaches prevailing.
Computer vision dominates, with GANs as primary generative models, while diffusion models, transformers, and RNNs compete.
arXiv Detail & Related papers (2024-01-04T20:23:51Z) - Exploring Progress in Multivariate Time Series Forecasting: Comprehensive Benchmarking and Heterogeneity Analysis [70.78170766633039]
We address the need for means of assessing MTS forecasting proposals reliably and fairly.
BasicTS+ is a benchmark designed to enable fair, comprehensive, and reproducible comparison of MTS forecasting solutions.
We apply BasicTS+ along with rich datasets to assess the capabilities of more than 45 MTS forecasting solutions.
arXiv Detail & Related papers (2023-10-09T19:52:22Z) - Can LMs Generalize to Future Data? An Empirical Analysis on Text
Summarization [50.20034493626049]
Recent pre-trained language models (PLMs) achieve promising results in existing abstractive summarization datasets.
Existing summarization benchmarks overlap in time with the standard pre-training corpora and finetuning datasets.
We show that parametric knowledge stored in summarization models significantly affects the faithfulness of the generated summaries on future data.
arXiv Detail & Related papers (2023-05-03T08:08:07Z) - PIETS: Parallelised Irregularity Encoders for Forecasting with
Heterogeneous Time-Series [5.911865723926626]
Heterogeneity and irregularity of multi-source data sets present a significant challenge to time-series analysis.
In this work, we design a novel architecture, PIETS, to model heterogeneous time-series.
We show that PIETS is able to effectively model heterogeneous temporal data and outperforms other state-of-the-art approaches in the prediction task.
arXiv Detail & Related papers (2021-09-30T20:01:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.