Towards Foundation Time Series Model: To Synthesize Or Not To
Synthesize?
- URL: http://arxiv.org/abs/2403.02534v1
- Date: Mon, 4 Mar 2024 23:03:17 GMT
- Title: Towards Foundation Time Series Model: To Synthesize Or Not To
Synthesize?
- Authors: Kseniia Kuvshinova, Olga Tsymboi, Alina Kostromina, Dmitry Simakov,
Elizaveta Kovtun
- Abstract summary: We consider the essential question if it is advantageous to train a foundation model on synthetic data or it is better to utilize only a limited number of real-life examples.
Our experiments are conducted only for regular time series and speak in favor of leveraging solely the real time series.
- Score: 2.8707270250981094
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The industry is rich in cases when we are required to make forecasting for
large amounts of time series at once. However, we might be in a situation where
we can not afford to train a separate model for each of them. Such issue in
time series modeling remains without due attention. The remedy for this setting
is the establishment of a foundation model. Such a model is expected to work in
zero-shot and few-shot regimes. However, what should we take as a training
dataset for such kind of model?
Witnessing the benefits from the enrichment of NLP datasets with
artificially-generated data, we might want to adopt their experience for time
series. In contrast to natural language, the process of generation of synthetic
time series data is even more favorable because it provides full control of
series patterns, time horizons, and number of samples. In this work, we
consider the essential question if it is advantageous to train a foundation
model on synthetic data or it is better to utilize only a limited number of
real-life examples. Our experiments are conducted only for regular time series
and speak in favor of leveraging solely the real time series. Moreover, the
choice of the proper source dataset strongly influences the performance during
inference. When provided access even to a limited quantity of short time series
data, employing it within a supervised framework yields more favorable results
than training on a larger volume of synthetic data. The code for our
experiments is publicly available on Github
\url{https://github.com/sb-ai-lab/synthesize_or_not}.
Related papers
- Beyond Data Scarcity: A Frequency-Driven Framework for Zero-Shot Forecasting [15.431513584239047]
Time series forecasting is critical in numerous real-world applications.
Traditional forecasting techniques struggle when data is scarce or not available at all.
Recent advancements often leverage large-scale foundation models for such tasks.
arXiv Detail & Related papers (2024-11-24T07:44:39Z) - Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts [103.725112190618]
This paper introduces Moirai-MoE, using a single input/output projection layer while delegating the modeling of diverse time series patterns to the sparse mixture of experts.
Extensive experiments on 39 datasets demonstrate the superiority of Moirai-MoE over existing foundation models in both in-distribution and zero-shot scenarios.
arXiv Detail & Related papers (2024-10-14T13:01:11Z) - Leveraging Priors via Diffusion Bridge for Time Series Generation [3.2066708654182743]
Time series generation is widely used in real-world applications such as simulation, data augmentation, and hypothesis test techniques.
diffusion models have emerged as the de facto approach for time series generation.
TimeBridge is a framework that enables flexible synthesis by leveraging diffusion bridges to learn the transport between chosen prior and data distributions.
arXiv Detail & Related papers (2024-08-13T06:47:59Z) - Time Series Data Augmentation as an Imbalanced Learning Problem [2.5536554335016417]
We use oversampling strategies to create synthetic time series observations and improve the accuracy of forecasting models.
We carried out experiments using 7 different databases that contain a total of 5502 univariate time series.
We found that the proposed solution outperforms both a global and a local model, thus providing a better trade-off between these two approaches.
arXiv Detail & Related papers (2024-04-29T09:27:15Z) - Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai)
Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains.
Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z) - Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM)
During pre-training, we curate large-scale datasets with up to 1 billion time points.
To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z) - Time Series Synthesis Using the Matrix Profile for Anonymization [32.22243483781984]
Many researchers cannot release their data due to privacy regulations or fear of leaking confidential business information.
We propose the Time Series Synthesis Using the Matrix Profile (TSSUMP) method, where synthesized time series can be released in lieu of the original data.
We test our method on a case study of ECG and gender masking prediction.
arXiv Detail & Related papers (2023-11-05T04:27:24Z) - Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks.
Such models tend to be large and require commensurate volumes of training data.
It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs.
Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z) - Lag-Llama: Towards Foundation Models for Probabilistic Time Series
Forecasting [54.04430089029033]
We present Lag-Llama, a general-purpose foundation model for time series forecasting based on a decoder-only transformer architecture.
Lag-Llama is pretrained on a large corpus of diverse time series data from several domains, and demonstrates strong zero-shot generalization capabilities.
When fine-tuned on relatively small fractions of such previously unseen datasets, Lag-Llama achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-12T12:29:32Z) - Pushing the Limits of Pre-training for Time Series Forecasting in the
CloudOps Domain [54.67888148566323]
We introduce three large-scale time series forecasting datasets from the cloud operations domain.
We show it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size.
Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method.
arXiv Detail & Related papers (2023-10-08T08:09:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.