INGB: Informed Nonlinear Granular Ball Oversampling Framework for Noisy
Imbalanced Classification
- URL: http://arxiv.org/abs/2307.01224v1
- Date: Mon, 3 Jul 2023 01:55:20 GMT
- Title: INGB: Informed Nonlinear Granular Ball Oversampling Framework for Noisy
Imbalanced Classification
- Authors: Min Li, Hao Zhou, Qun Liu, Yabin Shao, and Guoying Wang
- Abstract summary: In classification problems, the datasets are usually imbalanced, noisy or complex.
An informed nonlinear oversampling framework with the granular ball (INGB) as a new direction of oversampling is proposed in this paper.
- Score: 23.9207014576848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In classification problems, the datasets are usually imbalanced, noisy or
complex. Most sampling algorithms only make some improvements to the linear
sampling mechanism of the synthetic minority oversampling technique (SMOTE).
Nevertheless, linear oversampling has several unavoidable drawbacks. Linear
oversampling is susceptible to overfitting, and the synthetic samples lack
diversity and rarely account for the original distribution characteristics. An
informed nonlinear oversampling framework with the granular ball (INGB) as a
new direction of oversampling is proposed in this paper. It uses granular balls
to simulate the spatial distribution characteristics of datasets, and informed
entropy is utilized to further optimize the granular-ball space. Then,
nonlinear oversampling is performed by following high-dimensional sparsity and
the isotropic Gaussian distribution. Furthermore, INGB has good compatibility.
Not only can it be combined with most SMOTE-based sampling algorithms to
improve their performance, but it can also be easily extended to noisy
imbalanced multi-classification problems. The mathematical model and
theoretical proof of INGB are given in this work. Extensive experiments
demonstrate that INGB outperforms the traditional linear sampling frameworks
and algorithms in oversampling on complex datasets.
Related papers
- Fast Semisupervised Unmixing Using Nonconvex Optimization [80.11512905623417]
We introduce a novel convex convex model for semi/library-based unmixing.
We demonstrate the efficacy of Alternating Methods of sparse unsupervised unmixing.
arXiv Detail & Related papers (2024-01-23T10:07:41Z) - BSGAN: A Novel Oversampling Technique for Imbalanced Pattern
Recognitions [0.0]
Class imbalanced problems (CIP) are one of the potential challenges in developing unbiased Machine Learning (ML) models for predictions.
CIP occurs when data samples are not equally distributed between the two or multiple classes.
We propose a hybrid oversampling technique by combining the power of borderline SMOTE and Generative Adrial Network to generate more diverse data.
arXiv Detail & Related papers (2023-05-16T20:02:39Z) - Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data.
We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - A Novel Hybrid Sampling Framework for Imbalanced Learning [0.0]
"SMOTE-RUS-NC" has been compared with other state-of-the-art sampling techniques.
Rigorous experimentation has been conducted on 26 imbalanced datasets.
arXiv Detail & Related papers (2022-08-20T07:04:00Z) - A Robust and Flexible EM Algorithm for Mixtures of Elliptical
Distributions with Missing Data [71.9573352891936]
This paper tackles the problem of missing data imputation for noisy and non-Gaussian data.
A new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data.
Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data.
arXiv Detail & Related papers (2022-01-28T10:01:37Z) - Stable and Compact Face Recognition via Unlabeled Data Driven Sparse
Representation-Based Classification [39.398339531136344]
An unlabeled data driven inverse projection pseudo-full-space representation-based classification model is proposed.
The proposed model aims to mine the hidden semantic information and intrinsic structure information of all available data.
Experiments on three public datasets show that the proposed LR-S-PFSRC model achieves stable results.
arXiv Detail & Related papers (2021-11-04T13:19:38Z) - A multi-schematic classifier-independent oversampling approach for
imbalanced datasets [0.0]
It is evident from previous studies that different oversampling algorithms have different degrees of efficiency with different classifiers.
Here, we overcome this problem with a multi-schematic and classifier-independent oversampling approach: ProWRAS.
ProWRAS integrates the Localized Random Affine Shadowsampling (LoRAS)algorithm and the Proximity Weighted Synthetic oversampling (ProWSyn) algorithm.
arXiv Detail & Related papers (2021-07-15T14:03:24Z) - GMOTE: Gaussian based minority oversampling technique for imbalanced
classification adapting tail probability of outliers [0.0]
Data-level approaches mainly use the oversampling methods to solve the problem, such as synthetic minority oversampling Technique (SMOTE)
In this paper, we proposed Gaussian based minority oversampling technique (GMOTE) with a statistical perspective for imbalanced datasets.
When the GMOTE is combined with classification and regression tree (CART) or support vector machine (SVM), it shows better accuracy and F1-Score.
arXiv Detail & Related papers (2021-05-09T07:04:37Z) - Bandit Samplers for Training Graph Neural Networks [63.17765191700203]
Several sampling algorithms with variance reduction have been proposed for accelerating the training of Graph Convolution Networks (GCNs)
These sampling algorithms are not applicable to more general graph neural networks (GNNs) where the message aggregator contains learned weights rather than fixed weights, such as Graph Attention Networks (GAT)
arXiv Detail & Related papers (2020-06-10T12:48:37Z) - Non-Adaptive Adaptive Sampling on Turnstile Streams [57.619901304728366]
We give the first relative-error algorithms for column subset selection, subspace approximation, projective clustering, and volume on turnstile streams that use space sublinear in $n$.
Our adaptive sampling procedure has a number of applications to various data summarization problems that either improve state-of-the-art or have only been previously studied in the more relaxed row-arrival model.
arXiv Detail & Related papers (2020-04-23T05:00:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.