Related papers: Class-Based Time Series Data Augmentation to Mitigate Extreme Class Imbalance for Solar Flare Prediction

Class-Based Time Series Data Augmentation to Mitigate Extreme Class Imbalance for Solar Flare Prediction

URL: http://arxiv.org/abs/2405.20590v1
Date: Fri, 31 May 2024 03:03:19 GMT
Title: Class-Based Time Series Data Augmentation to Mitigate Extreme Class Imbalance for Solar Flare Prediction
Authors: Junzhi Wen, Rafal A. Angryk,
Abstract summary: Time series data plays a crucial role across various domains, making it valuable for decision-making and predictive modeling. Machine learning (ML) and deep learning (DL) have shown promise in this regard, yet their performance hinges on data quality and quantity. Data augmentation techniques offer a potential solution to address these challenges, yet their effectiveness on multivariate time series datasets remains underexplored.
Score: 1.4272411349249625
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Time series data plays a crucial role across various domains, making it valuable for decision-making and predictive modeling. Machine learning (ML) and deep learning (DL) have shown promise in this regard, yet their performance hinges on data quality and quantity, often constrained by data scarcity and class imbalance, particularly for rare events like solar flares. Data augmentation techniques offer a potential solution to address these challenges, yet their effectiveness on multivariate time series datasets remains underexplored. In this study, we propose a novel data augmentation method for time series data named Mean Gaussian Noise (MGN). We investigate the performance of MGN compared to eight existing basic data augmentation methods on a multivariate time series dataset for solar flare prediction, SWAN-SF, using a ML algorithm for time series data, TimeSeriesSVC. The results demonstrate the efficacy of MGN and highlight its potential for improving classification performance in scenarios with extremely imbalanced data. Our time complexity analysis shows that MGN also has a competitive computational cost compared to the investigated alternative methods.

Related papers

A Review of the Long Horizon Forecasting Problem in Time Series Analysis [0.0]
The long horizon forecasting (LHF) problem has come up in the time series literature for over the last 35 years or so.<n>Deep learning has incorporated variants of trend, seasonality, fourier and wavelet transforms, misspecification bias reduction and bandpass filters.<n>We highlight time series decomposition techniques, input data preprocessing and dataset windowing schemes that improve performance.
arXiv Detail & Related papers (2025-06-15T10:49:50Z)
Empowering Time Series Analysis with Synthetic Data: A Survey and Outlook in the Era of Foundation Models [104.17057231661371]
Time series analysis is crucial for understanding dynamics of complex systems. Recent advances in foundation models have led to task-agnostic Time Series Foundation Models (TSFMs) and Large Language Model-based Time Series Models (TSLLMs) Their success depends on large, diverse, and high-quality datasets, which are challenging to build due to regulatory, diversity, quality, and quantity constraints. This survey provides a comprehensive review of synthetic data for TSFMs and TSLLMs, analyzing data generation strategies, their role in model pretraining, fine-tuning, and evaluation, and identifying future research directions.
arXiv Detail & Related papers (2025-03-14T13:53:46Z)
Large Language Models are Few-shot Multivariate Time Series Classifiers [23.045734479292356]
Large Language Models (LLMs) have been extensively applied in time series analysis. Yet, their utility in the few-shot classification (i.e., a crucial training scenario) is underexplored. We aim to leverage the extensive pre-trained knowledge in LLMs to overcome the data scarcity problem.
arXiv Detail & Related papers (2025-01-30T03:59:59Z)
Tailored Forecasting from Short Time Series via Meta-learning [0.0]
We introduce Meta-learning for Tailored Forecasting from Related Time Series (METAFORS) By leveraging a library of models trained on related systems, METAFORS builds tailored models to forecast system evolution with limited data. We demonstrate METAFORS' ability to predict both short-term dynamics and long-term statistics, even when test and related systems exhibit significantly different behaviors.
arXiv Detail & Related papers (2025-01-27T18:58:04Z)
Are Large Language Models Useful for Time Series Data Analysis? [3.44393516559102]
Time series data plays a critical role across diverse domains such as healthcare, energy, and finance. This study investigates whether large language models (LLMs) are effective for time series data analysis.
arXiv Detail & Related papers (2024-12-16T02:47:44Z)
Tackling Data Heterogeneity in Federated Time Series Forecasting [61.021413959988216]
Time series forecasting plays a critical role in various real-world applications, including energy consumption prediction, disease transmission monitoring, and weather forecasting. Most existing methods rely on a centralized training paradigm, where large amounts of data are collected from distributed devices to a central cloud server. We propose a novel framework, Fed-TREND, to address data heterogeneity by generating informative synthetic data as auxiliary knowledge carriers.
arXiv Detail & Related papers (2024-11-24T04:56:45Z)
Contrastive Representation Learning for Predicting Solar Flares from Extremely Imbalanced Multivariate Time Series Data [1.024113475677323]
Major solar flares are abrupt surges in the Sun's magnetic flux, presenting significant risks to technological infrastructure. In this paper, we introduce CONTREX, a novel contrastive representation learning approach for multivariate time series data. Our approach shows promising solar flare prediction results on the Space Weather Analytics for Solar Flares (SWAN-SF) multivariate time series benchmark dataset.
arXiv Detail & Related papers (2024-10-01T01:20:47Z)
MGCP: A Multi-Grained Correlation based Prediction Network for Multivariate Time Series [54.91026286579748]
We propose a Multi-Grained Correlations-based Prediction Network. It simultaneously considers correlations at three levels to enhance prediction performance. It employs adversarial training with an attention mechanism-based predictor and conditional discriminator to optimize prediction results at coarse-grained level.
arXiv Detail & Related papers (2024-05-30T03:32:44Z)
Improving age prediction: Utilizing LSTM-based dynamic forecasting for data augmentation in multivariate time series analysis [16.91773394335563]
We propose a data augmentation and validation framework that utilizes dynamic forecasting with Long Short-Term Memory (LSTM) networks to enrich datasets. The effectiveness of these augmented datasets was then compared with the original data using various deep learning models designed for chronological age prediction tasks.
arXiv Detail & Related papers (2023-12-11T22:47:26Z)
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs. We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting. Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z)
Grouped self-attention mechanism for a memory-efficient Transformer [64.0125322353281]
Real-world tasks such as forecasting weather, electricity consumption, and stock market involve predicting data that vary over time. Time-series data are generally recorded over a long period of observation with long sequences owing to their periodic characteristics and long-range dependencies over time. We propose two novel modules, Grouped Self-Attention (GSA) and Compressed Cross-Attention (CCA) Our proposed model efficiently exhibited reduced computational complexity and performance comparable to or better than existing methods.
arXiv Detail & Related papers (2022-10-02T06:58:49Z)
Time Series Classification Using Convolutional Neural Network On Imbalanced Datasets [0.0]
Time Series Classification (TSC) has drawn a lot of attention in literature because of its broad range of applications. This paper uses both sampling-based and algorithmic approaches to address the imbalance problem. Despite having a high imbalance ratio, the result showed that F score could be as high as 97.6% for the simulated TwoPatterns dataset.
arXiv Detail & Related papers (2021-10-10T10:02:14Z)
Towards Synthetic Multivariate Time Series Generation for Flare Forecasting [5.098461305284216]
One of the limiting factors in training data-driven, rare-event prediction algorithms is the scarcity of the events of interest. In this study, we explore the usefulness of the conditional generative adversarial network (CGAN) as a means to perform data-informed oversampling.
arXiv Detail & Related papers (2021-05-16T22:23:23Z)
Learning summary features of time series for likelihood free inference [93.08098361687722]
We present a data-driven strategy for automatically learning summary features from time series data. Our results indicate that learning summary features from data can compete and even outperform LFI methods based on hand-crafted values.
arXiv Detail & Related papers (2020-12-04T19:21:37Z)
Improving the Accuracy of Global Forecasting Models using Time Series Data Augmentation [7.38079566297881]
Forecasting models that are trained across sets of many time series, known as Global Forecasting Models (GFM), have shown promising results in forecasting competitions and real-world applications. We propose a novel, data augmentation based forecasting framework that is capable of improving the baseline accuracy of GFM models in less data-abundant settings.
arXiv Detail & Related papers (2020-08-06T13:52:20Z)
Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies. THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin. We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.