Class-Based Time Series Data Augmentation to Mitigate Extreme Class Imbalance for Solar Flare Prediction
- URL: http://arxiv.org/abs/2405.20590v1
- Date: Fri, 31 May 2024 03:03:19 GMT
- Title: Class-Based Time Series Data Augmentation to Mitigate Extreme Class Imbalance for Solar Flare Prediction
- Authors: Junzhi Wen, Rafal A. Angryk,
- Abstract summary: Time series data plays a crucial role across various domains, making it valuable for decision-making and predictive modeling.
Machine learning (ML) and deep learning (DL) have shown promise in this regard, yet their performance hinges on data quality and quantity.
Data augmentation techniques offer a potential solution to address these challenges, yet their effectiveness on multivariate time series datasets remains underexplored.
- Score: 1.4272411349249625
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Time series data plays a crucial role across various domains, making it valuable for decision-making and predictive modeling. Machine learning (ML) and deep learning (DL) have shown promise in this regard, yet their performance hinges on data quality and quantity, often constrained by data scarcity and class imbalance, particularly for rare events like solar flares. Data augmentation techniques offer a potential solution to address these challenges, yet their effectiveness on multivariate time series datasets remains underexplored. In this study, we propose a novel data augmentation method for time series data named Mean Gaussian Noise (MGN). We investigate the performance of MGN compared to eight existing basic data augmentation methods on a multivariate time series dataset for solar flare prediction, SWAN-SF, using a ML algorithm for time series data, TimeSeriesSVC. The results demonstrate the efficacy of MGN and highlight its potential for improving classification performance in scenarios with extremely imbalanced data. Our time complexity analysis shows that MGN also has a competitive computational cost compared to the investigated alternative methods.
Related papers
- Tackling Data Heterogeneity in Federated Time Series Forecasting [61.021413959988216]
Time series forecasting plays a critical role in various real-world applications, including energy consumption prediction, disease transmission monitoring, and weather forecasting.
Most existing methods rely on a centralized training paradigm, where large amounts of data are collected from distributed devices to a central cloud server.
We propose a novel framework, Fed-TREND, to address data heterogeneity by generating informative synthetic data as auxiliary knowledge carriers.
arXiv Detail & Related papers (2024-11-24T04:56:45Z) - Contrastive Representation Learning for Predicting Solar Flares from Extremely Imbalanced Multivariate Time Series Data [1.024113475677323]
Major solar flares are abrupt surges in the Sun's magnetic flux, presenting significant risks to technological infrastructure.
In this paper, we introduce CONTREX, a novel contrastive representation learning approach for multivariate time series data.
Our approach shows promising solar flare prediction results on the Space Weather Analytics for Solar Flares (SWAN-SF) multivariate time series benchmark dataset.
arXiv Detail & Related papers (2024-10-01T01:20:47Z) - MGCP: A Multi-Grained Correlation based Prediction Network for Multivariate Time Series [54.91026286579748]
We propose a Multi-Grained Correlations-based Prediction Network.
It simultaneously considers correlations at three levels to enhance prediction performance.
It employs adversarial training with an attention mechanism-based predictor and conditional discriminator to optimize prediction results at coarse-grained level.
arXiv Detail & Related papers (2024-05-30T03:32:44Z) - Improving age prediction: Utilizing LSTM-based dynamic forecasting for
data augmentation in multivariate time series analysis [16.91773394335563]
We propose a data augmentation and validation framework that utilizes dynamic forecasting with Long Short-Term Memory (LSTM) networks to enrich datasets.
The effectiveness of these augmented datasets was then compared with the original data using various deep learning models designed for chronological age prediction tasks.
arXiv Detail & Related papers (2023-12-11T22:47:26Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - Grouped self-attention mechanism for a memory-efficient Transformer [64.0125322353281]
Real-world tasks such as forecasting weather, electricity consumption, and stock market involve predicting data that vary over time.
Time-series data are generally recorded over a long period of observation with long sequences owing to their periodic characteristics and long-range dependencies over time.
We propose two novel modules, Grouped Self-Attention (GSA) and Compressed Cross-Attention (CCA)
Our proposed model efficiently exhibited reduced computational complexity and performance comparable to or better than existing methods.
arXiv Detail & Related papers (2022-10-02T06:58:49Z) - Time Series Classification Using Convolutional Neural Network On
Imbalanced Datasets [0.0]
Time Series Classification (TSC) has drawn a lot of attention in literature because of its broad range of applications.
This paper uses both sampling-based and algorithmic approaches to address the imbalance problem.
Despite having a high imbalance ratio, the result showed that F score could be as high as 97.6% for the simulated TwoPatterns dataset.
arXiv Detail & Related papers (2021-10-10T10:02:14Z) - Towards Synthetic Multivariate Time Series Generation for Flare
Forecasting [5.098461305284216]
One of the limiting factors in training data-driven, rare-event prediction algorithms is the scarcity of the events of interest.
In this study, we explore the usefulness of the conditional generative adversarial network (CGAN) as a means to perform data-informed oversampling.
arXiv Detail & Related papers (2021-05-16T22:23:23Z) - Learning summary features of time series for likelihood free inference [93.08098361687722]
We present a data-driven strategy for automatically learning summary features from time series data.
Our results indicate that learning summary features from data can compete and even outperform LFI methods based on hand-crafted values.
arXiv Detail & Related papers (2020-12-04T19:21:37Z) - Improving the Accuracy of Global Forecasting Models using Time Series
Data Augmentation [7.38079566297881]
Forecasting models that are trained across sets of many time series, known as Global Forecasting Models (GFM), have shown promising results in forecasting competitions and real-world applications.
We propose a novel, data augmentation based forecasting framework that is capable of improving the baseline accuracy of GFM models in less data-abundant settings.
arXiv Detail & Related papers (2020-08-06T13:52:20Z) - Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies.
THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin.
We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.