Mitigating Data Scarcity in Time Series Analysis: A Foundation Model with Series-Symbol Data Generation
- URL: http://arxiv.org/abs/2502.15466v1
- Date: Fri, 21 Feb 2025 13:43:24 GMT
- Title: Mitigating Data Scarcity in Time Series Analysis: A Foundation Model with Series-Symbol Data Generation
- Authors: Wenxuan Wang, Kai Wu, Yujian Betterest Li, Dan Wang, Xiaoyu Zhang, Jing Liu,
- Abstract summary: Foundation models for time series analysis (TSA) have attracted significant attention.<n> challenges such as data scarcity and data imbalance continue to hinder their development.<n>We introduce a series-symbol (S2) dual-modulity data generation mechanism, enabling the unrestricted creation of high-quality time series data.
- Score: 34.04850897522787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Foundation models for time series analysis (TSA) have attracted significant attention. However, challenges such as data scarcity and data imbalance continue to hinder their development. To address this, we consider modeling complex systems through symbolic expressions that serve as semantic descriptors of time series. Building on this concept, we introduce a series-symbol (S2) dual-modulity data generation mechanism, enabling the unrestricted creation of high-quality time series data paired with corresponding symbolic representations. Leveraging the S2 dataset, we develop SymTime, a pre-trained foundation model for TSA. SymTime demonstrates competitive performance across five major TSA tasks when fine-tuned with downstream task, rivaling foundation models pre-trained on real-world datasets. This approach underscores the potential of dual-modality data generation and pretraining mechanisms in overcoming data scarcity and enhancing task performance.
Related papers
- Empowering Time Series Analysis with Synthetic Data: A Survey and Outlook in the Era of Foundation Models [104.17057231661371]
Time series analysis is crucial for understanding dynamics of complex systems.
Recent advances in foundation models have led to task-agnostic Time Series Foundation Models (TSFMs) and Large Language Model-based Time Series Models (TSLLMs)
Their success depends on large, diverse, and high-quality datasets, which are challenging to build due to regulatory, diversity, quality, and quantity constraints.
This survey provides a comprehensive review of synthetic data for TSFMs and TSLLMs, analyzing data generation strategies, their role in model pretraining, fine-tuning, and evaluation, and identifying future research directions.
arXiv Detail & Related papers (2025-03-14T13:53:46Z) - Tackling Data Heterogeneity in Federated Time Series Forecasting [61.021413959988216]
Time series forecasting plays a critical role in various real-world applications, including energy consumption prediction, disease transmission monitoring, and weather forecasting.
Most existing methods rely on a centralized training paradigm, where large amounts of data are collected from distributed devices to a central cloud server.
We propose a novel framework, Fed-TREND, to address data heterogeneity by generating informative synthetic data as auxiliary knowledge carriers.
arXiv Detail & Related papers (2024-11-24T04:56:45Z) - Meta-DiffuB: A Contextualized Sequence-to-Sequence Text Diffusion Model with Meta-Exploration [53.63593099509471]
We propose a scheduler-exploiter S2S-Diffusion paradigm designed to overcome the limitations of existing S2S-Diffusion models.
We employ Meta-Exploration to train an additional scheduler model dedicated to scheduling contextualized noise for each sentence.
Our exploiter model, an S2S-Diffusion model, leverages the noise scheduled by our scheduler model for updating and generation.
arXiv Detail & Related papers (2024-10-17T04:06:02Z) - MTSA-SNN: A Multi-modal Time Series Analysis Model Based on Spiking
Neural Network [23.303230721723278]
We propose a Multi-modal Time Series Analysis Model Based on Spiking Neural Network (MTSA-SNN)
The Pulse unifies the encoding of temporal images and sequential information in a common pulse-based representation.
We incorporate wavelet transform operations to enhance the model's ability to analyze and evaluate temporal information.
arXiv Detail & Related papers (2024-02-08T05:39:11Z) - Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM)
During pre-training, we curate large-scale datasets with up to 1 billion time points.
To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z) - Parsimony or Capability? Decomposition Delivers Both in Long-term Time Series Forecasting [46.63798583414426]
Long-term time series forecasting (LTSF) represents a critical frontier in time series analysis.
Our study demonstrates, through both analytical and empirical evidence, that decomposition is key to containing excessive model inflation.
Remarkably, by tailoring decomposition to the intrinsic dynamics of time series data, our proposed model outperforms existing benchmarks.
arXiv Detail & Related papers (2024-01-22T13:15:40Z) - FormerTime: Hierarchical Multi-Scale Representations for Multivariate
Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task.
It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z) - Time-series Transformer Generative Adversarial Networks [5.254093731341154]
We consider limitations posed specifically on time-series data and present a model that can generate synthetic time-series.
A model that generates synthetic time-series data has two objectives: 1) to capture the stepwise conditional distribution of real sequences, and 2) to faithfully model the joint distribution of entire real sequences.
We present TsT-GAN, a framework that capitalises on the Transformer architecture to satisfy the desiderata and compare its performance against five state-of-the-art models on five datasets.
arXiv Detail & Related papers (2022-05-23T10:04:21Z) - DAMNETS: A Deep Autoregressive Model for Generating Markovian Network
Time Series [6.834250594353335]
Generative models for network time series (also known as dynamic graphs) have tremendous potential in fields such as epidemiology, biology and economics.
Here we introduce DAMNETS, a scalable deep generative model for network time series.
arXiv Detail & Related papers (2022-03-28T18:14:04Z) - PIETS: Parallelised Irregularity Encoders for Forecasting with
Heterogeneous Time-Series [5.911865723926626]
Heterogeneity and irregularity of multi-source data sets present a significant challenge to time-series analysis.
In this work, we design a novel architecture, PIETS, to model heterogeneous time-series.
We show that PIETS is able to effectively model heterogeneous temporal data and outperforms other state-of-the-art approaches in the prediction task.
arXiv Detail & Related papers (2021-09-30T20:01:19Z) - SeDyT: A General Framework for Multi-Step Event Forecasting via Sequence
Modeling on Dynamic Entity Embeddings [6.314274045636102]
Event forecasting is a critical and challenging task in Temporal Knowledge Graph reasoning.
We propose SeDyT, a discriminative framework that performs sequence modeling on the dynamic entity embeddings.
By combining temporal Graph Neural Network models and sequence models, SeDyT achieves an average of 2.4% MRR improvement.
arXiv Detail & Related papers (2021-09-09T20:32:48Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.