Minority Oversampling for Imbalanced Time Series Classification
- URL: http://arxiv.org/abs/2004.06373v5
- Date: Fri, 20 Aug 2021 15:26:49 GMT
- Title: Minority Oversampling for Imbalanced Time Series Classification
- Authors: Tuanfei Zhu, Cheng Luo, Jing Li, Siqi Ren and Zhihong Zhang
- Abstract summary: This paper proposes a structure preserving Oversampling method to combat the High-dimensional Imbalanced Time-series classification.
Experimental results on several publicly available time-series datasets demonstrate the superiority of OHIT against the state-of-the-art oversampling algorithms.
- Score: 7.695093197007146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many important real-world applications involve time-series data with skewed
distribution. Compared to conventional imbalance learning problems, the
classification of imbalanced time-series data is more challenging due to high
dimensionality and high inter-variable correlation. This paper proposes a
structure preserving Oversampling method to combat the High-dimensional
Imbalanced Time-series classification (OHIT). OHIT first leverages a
density-ratio based shared nearest neighbor clustering algorithm to capture the
modes of minority class in high-dimensional space. It then for each mode
applies the shrinkage technique of large-dimensional covariance matrix to
obtain accurate and reliable covariance structure. Finally, OHIT generates the
structure-preserving synthetic samples based on multivariate Gaussian
distribution by using the estimated covariance matrices. Experimental results
on several publicly available time-series datasets (including unimodal and
multimodal) demonstrate the superiority of OHIT against the state-of-the-art
oversampling algorithms in terms of F1, G-mean, and AUC.
Related papers
- MGCP: A Multi-Grained Correlation based Prediction Network for Multivariate Time Series [54.91026286579748]
We propose a Multi-Grained Correlations-based Prediction Network.
It simultaneously considers correlations at three levels to enhance prediction performance.
It employs adversarial training with an attention mechanism-based predictor and conditional discriminator to optimize prediction results at coarse-grained level.
arXiv Detail & Related papers (2024-05-30T03:32:44Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Generalized Oversampling for Learning from Imbalanced datasets and
Associated Theory [0.0]
In supervised learning, it is quite frequent to be confronted with real imbalanced datasets.
We propose a data augmentation procedure, the GOLIATH algorithm, based on kernel density estimates.
We evaluate the performance of the GOLIATH algorithm in imbalanced regression situations.
arXiv Detail & Related papers (2023-08-05T23:08:08Z) - Large-scale Fully-Unsupervised Re-Identification [78.47108158030213]
We propose two strategies to learn from large-scale unlabeled data.
The first strategy performs a local neighborhood sampling to reduce the dataset size in each without violating neighborhood relationships.
A second strategy leverages a novel Re-Ranking technique, which has a lower time upper bound complexity and reduces the memory complexity from O(n2) to O(kn) with k n.
arXiv Detail & Related papers (2023-07-26T16:19:19Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Likelihood Adjusted Semidefinite Programs for Clustering Heterogeneous
Data [16.153709556346417]
Clustering is a widely deployed learning tool.
iLA-SDP is less sensitive than EM to and more stable on high-dimensional data.
arXiv Detail & Related papers (2022-09-29T21:03:13Z) - Imputing Missing Observations with Time Sliced Synthetic Minority
Oversampling Technique [0.3973560285628012]
We present a simple yet novel time series imputation technique with the goal of constructing an irregular time series that is uniform across every sample in a data set.
We fix a grid defined by the midpoints of non-overlapping bins (dubbed "slices") of observation times and ensure that each sample has values for all of the features at that given time.
This allows one to both impute fully missing observations to allow uniform time series classification across the entire data and, in special cases, to impute individually missing features.
arXiv Detail & Related papers (2022-01-14T19:23:24Z) - A Unified Framework for Multi-distribution Density Ratio Estimation [101.67420298343512]
Binary density ratio estimation (DRE) provides the foundation for many state-of-the-art machine learning algorithms.
We develop a general framework from the perspective of Bregman minimization divergence.
We show that our framework leads to methods that strictly generalize their counterparts in binary DRE.
arXiv Detail & Related papers (2021-12-07T01:23:20Z) - Learning Gaussian Mixtures with Generalised Linear Models: Precise
Asymptotics in High-dimensions [79.35722941720734]
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks.
We prove exacts characterising the estimator in high-dimensions via empirical risk minimisation.
We discuss how our theory can be applied beyond the scope of synthetic data.
arXiv Detail & Related papers (2021-06-07T16:53:56Z) - Statistical power for cluster analysis [0.0]
Cluster algorithms are increasingly popular in biomedical research.
We estimate power and accuracy for common analysis through simulation.
We recommend that researchers only apply cluster analysis when large subgroup separation is expected.
arXiv Detail & Related papers (2020-03-01T02:43:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.