Solving the Class Imbalance Problem Using a Counterfactual Method for
Data Augmentation
- URL: http://arxiv.org/abs/2111.03516v1
- Date: Fri, 5 Nov 2021 14:14:06 GMT
- Title: Solving the Class Imbalance Problem Using a Counterfactual Method for
Data Augmentation
- Authors: Mohammed Temraz and Mark T. Keane
- Abstract summary: Learning from class imbalanced datasets poses challenges for machine learning algorithms.
We advance a novel data augmentation method (adapted from eXplainable AI) that generates synthetic, counterfactual instances in the minority class.
Several experiments using four different classifiers and 25 datasets are reported, which show that this Counterfactual Augmentation method (CFA) generates useful synthetic data points in the minority class.
- Score: 4.454557728745761
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning from class imbalanced datasets poses challenges for many machine
learning algorithms. Many real-world domains are, by definition, class
imbalanced by virtue of having a majority class that naturally has many more
instances than its minority class (e.g. genuine bank transactions occur much
more often than fraudulent ones). Many methods have been proposed to solve the
class imbalance problem, among the most popular being oversampling techniques
(such as SMOTE). These methods generate synthetic instances in the minority
class, to balance the dataset, performing data augmentations that improve the
performance of predictive machine learning (ML) models. In this paper we
advance a novel data augmentation method (adapted from eXplainable AI), that
generates synthetic, counterfactual instances in the minority class. Unlike
other oversampling techniques, this method adaptively combines exist-ing
instances from the dataset, using actual feature-values rather than
interpolating values between instances. Several experiments using four
different classifiers and 25 datasets are reported, which show that this
Counterfactual Augmentation method (CFA) generates useful synthetic data points
in the minority class. The experiments also show that CFA is competitive with
many other oversampling methods many of which are variants of SMOTE. The basis
for CFAs performance is discussed, along with the conditions under which it is
likely to perform better or worse in future tests.
Related papers
- Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Intra-class Adaptive Augmentation with Neighbor Correction for Deep
Metric Learning [99.14132861655223]
We propose a novel intra-class adaptive augmentation (IAA) framework for deep metric learning.
We reasonably estimate intra-class variations for every class and generate adaptive synthetic samples to support hard samples mining.
Our method significantly improves and outperforms the state-of-the-art methods on retrieval performances by 3%-6%.
arXiv Detail & Related papers (2022-11-29T14:52:38Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - HardVis: Visual Analytics to Handle Instance Hardness Using Undersampling and Oversampling Techniques [48.82319198853359]
HardVis is a visual analytics system designed to handle instance hardness mainly in imbalanced classification scenarios.
Users can explore subsets of data from different perspectives to decide all those parameters.
The efficacy and effectiveness of HardVis are demonstrated with a hypothetical usage scenario and a use case.
arXiv Detail & Related papers (2022-03-29T17:04:16Z) - Imbalanced Classification via Explicit Gradient Learning From Augmented
Data [0.0]
We propose a novel deep meta-learning technique to augment a given imbalanced dataset with new minority instances.
The advantage of the proposed method is demonstrated on synthetic and real-world datasets with various imbalance ratios.
arXiv Detail & Related papers (2022-02-21T22:16:50Z) - SMOTified-GAN for class imbalanced pattern classification problems [0.41998444721319217]
We propose a novel two-phase oversampling approach that has the synergy of SMOTE and GAN.
The experimental results prove the sample quality of minority class(es) has been improved in a variety of tested benchmark datasets.
arXiv Detail & Related papers (2021-08-06T06:14:05Z) - GMOTE: Gaussian based minority oversampling technique for imbalanced
classification adapting tail probability of outliers [0.0]
Data-level approaches mainly use the oversampling methods to solve the problem, such as synthetic minority oversampling Technique (SMOTE)
In this paper, we proposed Gaussian based minority oversampling technique (GMOTE) with a statistical perspective for imbalanced datasets.
When the GMOTE is combined with classification and regression tree (CART) or support vector machine (SVM), it shows better accuracy and F1-Score.
arXiv Detail & Related papers (2021-05-09T07:04:37Z) - Improving Calibration for Long-Tailed Recognition [68.32848696795519]
We propose two methods to improve calibration and performance in such scenarios.
For dataset bias due to different samplers, we propose shifted batch normalization.
Our proposed methods set new records on multiple popular long-tailed recognition benchmark datasets.
arXiv Detail & Related papers (2021-04-01T13:55:21Z) - A Synthetic Over-sampling method with Minority and Majority classes for
imbalance problems [0.0]
We propose a new method to generate synthetic instances using Minority and Majority classes (SOMM)
SOMM generates synthetic instances diversely within the minority data space.
It updates the generated instances adaptively to the neighbourhood including both classes.
arXiv Detail & Related papers (2020-11-09T03:39:56Z) - Conditional Wasserstein GAN-based Oversampling of Tabular Data for
Imbalanced Learning [10.051309746913512]
We propose an oversampling method based on a conditional Wasserstein GAN.
We benchmark our method against standard oversampling methods and the imbalanced baseline on seven real-world datasets.
arXiv Detail & Related papers (2020-08-20T20:33:56Z) - M2m: Imbalanced Classification via Major-to-minor Translation [79.09018382489506]
In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion.
In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples from more-frequent classes.
Our experimental results on a variety of class-imbalanced datasets show that the proposed method improves the generalization on minority classes significantly compared to other existing re-sampling or re-weighting methods.
arXiv Detail & Related papers (2020-04-01T13:21:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.