RB-CCR: Radial-Based Combined Cleaning and Resampling algorithm for
imbalanced data classification
- URL: http://arxiv.org/abs/2105.04009v1
- Date: Sun, 9 May 2021 19:47:45 GMT
- Title: RB-CCR: Radial-Based Combined Cleaning and Resampling algorithm for
imbalanced data classification
- Authors: Micha{\l} Koziarski, Colin Bellinger, Micha{\l} Wo\'zniak
- Abstract summary: Resampling the training data is the standard approach to improving classification performance on imbalanced binary data.
RB- CCR exploits the class potential to accurately locate sub-regions of the data-space for synthetic oversampling.
Our results show that RB- CCR achieves a better precision-recall trade-off than CCR and generally out-performs the state-of-the-art resampling methods in terms of AUC and G-mean.
- Score: 5.448684866061922
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-world classification domains, such as medicine, health and safety, and
finance, often exhibit imbalanced class priors and have asynchronous
misclassification costs. In such cases, the classification model must achieve a
high recall without significantly impacting precision. Resampling the training
data is the standard approach to improving classification performance on
imbalanced binary data. However, the state-of-the-art methods ignore the local
joint distribution of the data or correct it as a post-processing step. This
can causes sub-optimal shifts in the training distribution, particularly when
the target data distribution is complex. In this paper, we propose Radial-Based
Combined Cleaning and Resampling (RB-CCR). RB-CCR utilizes the concept of class
potential to refine the energy-based resampling approach of CCR. In particular,
RB-CCR exploits the class potential to accurately locate sub-regions of the
data-space for synthetic oversampling. The category sub-region for oversampling
can be specified as an input parameter to meet domain-specific needs or be
automatically selected via cross-validation. Our $5\times2$ cross-validated
results on 57 benchmark binary datasets with 9 classifiers show that RB-CCR
achieves a better precision-recall trade-off than CCR and generally
out-performs the state-of-the-art resampling methods in terms of AUC and
G-mean.
Related papers
- Asymptotic Normality of Infinite Centered Random Forests -Application to Imbalanced Classification [6.5160087003642]
In this paper, we study theoretically such a procedure, when the classifier is a Centered Random Forests (CRF)<n>We prove that the CRF trained on the rebalanced dataset exhibits a bias, which can be removed with appropriate techniques.<n>For high imbalance settings, we prove that the IS-ICRF estimator enjoys a variance reduction compared to the ICRF trained on the original data.
arXiv Detail & Related papers (2025-06-10T08:14:28Z) - Energy Score-based Pseudo-Label Filtering and Adaptive Loss for Imbalanced Semi-supervised SAR target recognition [1.2035771704626825]
Existing semi-supervised SAR ATR algorithms show low recognition accuracy in the case of class imbalance.
This work offers a non-balanced semi-supervised SAR target recognition approach using dynamic energy scores and adaptive loss.
arXiv Detail & Related papers (2024-11-06T14:45:16Z) - Risk-based Calibration for Generative Classifiers [4.792851066169872]
We propose a learning procedure called risk-based calibration (RC)
RC iteratively refines the generative classifier by adjusting its joint probability distribution according to the 0-1 loss in training samples.
RC significantly outperforms closed-form learning procedures in terms of both training error and generalization error.
arXiv Detail & Related papers (2024-09-05T14:06:56Z) - Coordinated Sparse Recovery of Label Noise [2.9495895055806804]
This study focuses on robust classification tasks where the label noise is instance-dependent.
We propose a method called Coordinated Sparse Recovery (CSR)
CSR introduces a collaboration matrix and confidence weights to coordinate model predictions and noise recovery, reducing error leakage.
Based on CSR, this study designs a joint sample selection strategy and constructs a comprehensive and powerful learning framework called CSR+.
arXiv Detail & Related papers (2024-04-07T03:41:45Z) - Latent Enhancing AutoEncoder for Occluded Image Classification [2.6217304977339473]
We introduce LEARN: Latent Enhancing feAture Reconstruction Network.
An auto-encoder based network that can be incorporated into the classification model before its head.
On the OccludedPASCAL3D+ dataset, the proposed LEARN outperforms standard classification models.
arXiv Detail & Related papers (2024-02-10T12:22:31Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Chasing Fairness Under Distribution Shift: A Model Weight Perturbation
Approach [72.19525160912943]
We first theoretically demonstrate the inherent connection between distribution shift, data perturbation, and model weight perturbation.
We then analyze the sufficient conditions to guarantee fairness for the target dataset.
Motivated by these sufficient conditions, we propose robust fairness regularization (RFR)
arXiv Detail & Related papers (2023-03-06T17:19:23Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Divide and Contrast: Source-free Domain Adaptation via Adaptive
Contrastive Learning [122.62311703151215]
Divide and Contrast (DaC) aims to connect the good ends of both worlds while bypassing their limitations.
DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals.
We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch.
arXiv Detail & Related papers (2022-11-12T09:21:49Z) - Open-Set Recognition: A Good Closed-Set Classifier is All You Need [146.6814176602689]
We show that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes.
We use this correlation to boost the performance of the cross-entropy OSR 'baseline' by improving its closed-set accuracy.
We also construct new benchmarks which better respect the task of detecting semantic novelty.
arXiv Detail & Related papers (2021-10-12T17:58:59Z) - Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier [68.38233199030908]
Long-tail recognition tackles the natural non-uniformly distributed data in realworld scenarios.
While moderns perform well on populated classes, its performance degrades significantly on tail classes.
Deep-RTC is proposed as a new solution to the long-tail problem, combining realism with hierarchical predictions.
arXiv Detail & Related papers (2020-07-20T05:57:42Z) - Improved Design of Quadratic Discriminant Analysis Classifier in
Unbalanced Settings [19.763768111774134]
quadratic discriminant analysis (QDA) or its regularized version (R-QDA) for classification is often not recommended.
We propose an improved R-QDA that is based on the use of two regularization parameters and a modified bias.
arXiv Detail & Related papers (2020-06-11T12:17:05Z) - On Positive-Unlabeled Classification in GAN [130.43248168149432]
This paper defines a positive and unlabeled classification problem for standard GANs.
It then leads to a novel technique to stabilize the training of the discriminator in GANs.
arXiv Detail & Related papers (2020-02-04T05:59:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.