Confronting Discrimination in Classification: Smote Based on
Marginalized Minorities in the Kernel Space for Imbalanced Data
- URL: http://arxiv.org/abs/2402.08202v1
- Date: Tue, 13 Feb 2024 04:03:09 GMT
- Title: Confronting Discrimination in Classification: Smote Based on
Marginalized Minorities in the Kernel Space for Imbalanced Data
- Authors: Lingyun Zhong
- Abstract summary: We propose a novel classification oversampling approach based on the decision boundary and sample proximity relationships.
We test the proposed method on a classic financial fraud dataset.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Financial fraud detection poses a typical challenge characterized by class
imbalance, where instances of fraud are extremely rare but can lead to
unpredictable economic losses if misidentified. Precisely classifying these
critical minority samples represents a challenging task within the
classification. The primary difficulty arises from mainstream classifiers,
which often exhibit "implicit discrimination" against minority samples in
evaluation metrics, which results in frequent misclassifications, and the key
to the problem lies in the overlap of feature spaces between majority and
minority samples. To address these challenges, oversampling is a feasible
solution, yet current classical oversampling methods often lack the necessary
caution in sample selection, exacerbating feature space overlap. In response,
we propose a novel classification oversampling approach based on the decision
boundary and sample proximity relationships. This method carefully considers
the distance between critical samples and the decision hyperplane, as well as
the density of surrounding samples, resulting in an adaptive oversampling
strategy in the kernel space. Finally, we test the proposed method on a classic
financial fraud dataset, and the results show that our proposed method provides
an effective and robust solution that can improve the classification accuracy
of minorities.
Related papers
- Adversarial Reweighting Guided by Wasserstein Distance for Bias
Mitigation [24.160692009892088]
Under-representation of minorities in the data makes the disparate treatment of subpopulations difficult to deal with during learning.
We propose a novel adversarial reweighting method to address such emphrepresentation bias.
arXiv Detail & Related papers (2023-11-21T15:46:11Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Shrinking Class Space for Enhanced Certainty in Semi-Supervised Learning [59.44422468242455]
We propose a novel method dubbed ShrinkMatch to learn uncertain samples.
For each uncertain sample, it adaptively seeks a shrunk class space, which merely contains the original top-1 class.
We then impose a consistency regularization between a pair of strongly and weakly augmented samples in the shrunk space to strive for discriminative representations.
arXiv Detail & Related papers (2023-08-13T14:05:24Z) - Centrality and Consistency: Two-Stage Clean Samples Identification for
Learning with Instance-Dependent Noisy Labels [87.48541631675889]
We propose a two-stage clean samples identification method.
First, we employ a class-level feature clustering procedure for the early identification of clean samples.
Second, for the remaining clean samples that are close to the ground truth class boundary, we propose a novel consistency-based classification method.
arXiv Detail & Related papers (2022-07-29T04:54:57Z) - Local overlap reduction procedure for dynamic ensemble selection [13.304462985219237]
Class imbalance is a characteristic known for making learning more challenging for classification models.
We propose a DS technique which attempts to minimize the effects of the local class overlap during the classification procedure.
Experimental results show that the proposed technique can significantly outperform the baseline.
arXiv Detail & Related papers (2022-06-16T21:31:05Z) - Holistic Approach to Measure Sample-level Adversarial Vulnerability and
its Utility in Building Trustworthy Systems [17.707594255626216]
Adversarial attack perturbs an image with an imperceptible noise, leading to incorrect model prediction.
We propose a holistic approach for quantifying adversarial vulnerability of a sample by combining different perspectives.
We demonstrate that by reliably estimating adversarial vulnerability at the sample level, it is possible to develop a trustworthy system.
arXiv Detail & Related papers (2022-05-05T12:36:17Z) - Few-shot Forgery Detection via Guided Adversarial Interpolation [56.59499187594308]
Existing forgery detection methods suffer from significant performance drops when applied to unseen novel forgery approaches.
We propose Guided Adversarial Interpolation (GAI) to overcome the few-shot forgery detection problem.
Our method is validated to be robust to choices of majority and minority forgery approaches.
arXiv Detail & Related papers (2022-04-12T16:05:10Z) - Minimax Active Learning [61.729667575374606]
Active learning aims to develop label-efficient algorithms by querying the most representative samples to be labeled by a human annotator.
Current active learning techniques either rely on model uncertainty to select the most uncertain samples or use clustering or reconstruction to choose the most diverse set of unlabeled examples.
We develop a semi-supervised minimax entropy-based active learning algorithm that leverages both uncertainty and diversity in an adversarial manner.
arXiv Detail & Related papers (2020-12-18T19:03:40Z) - Counterfactual-based minority oversampling for imbalanced classification [11.140929092818235]
A key challenge of oversampling in imbalanced classification is that the generation of new minority samples often neglects the usage of majority classes.
We present a new oversampling framework based on the counterfactual theory.
arXiv Detail & Related papers (2020-08-21T14:13:15Z) - Compressing Large Sample Data for Discriminant Analysis [78.12073412066698]
We consider the computational issues due to large sample size within the discriminant analysis framework.
We propose a new compression approach for reducing the number of training samples for linear and quadratic discriminant analysis.
arXiv Detail & Related papers (2020-05-08T05:09:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.