Related papers: Solving the Class Imbalance Problem Using a Counterfactual Method for Data Augmentation

Solving the Class Imbalance Problem Using a Counterfactual Method for Data Augmentation

URL: http://arxiv.org/abs/2111.03516v1
Date: Fri, 5 Nov 2021 14:14:06 GMT
Title: Solving the Class Imbalance Problem Using a Counterfactual Method for Data Augmentation
Authors: Mohammed Temraz and Mark T. Keane
Abstract summary: Learning from class imbalanced datasets poses challenges for machine learning algorithms. We advance a novel data augmentation method (adapted from eXplainable AI) that generates synthetic, counterfactual instances in the minority class. Several experiments using four different classifiers and 25 datasets are reported, which show that this Counterfactual Augmentation method (CFA) generates useful synthetic data points in the minority class.
Score: 4.454557728745761
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning from class imbalanced datasets poses challenges for many machine learning algorithms. Many real-world domains are, by definition, class imbalanced by virtue of having a majority class that naturally has many more instances than its minority class (e.g. genuine bank transactions occur much more often than fraudulent ones). Many methods have been proposed to solve the class imbalance problem, among the most popular being oversampling techniques (such as SMOTE). These methods generate synthetic instances in the minority class, to balance the dataset, performing data augmentations that improve the performance of predictive machine learning (ML) models. In this paper we advance a novel data augmentation method (adapted from eXplainable AI), that generates synthetic, counterfactual instances in the minority class. Unlike other oversampling techniques, this method adaptively combines exist-ing instances from the dataset, using actual feature-values rather than interpolating values between instances. Several experiments using four different classifiers and 25 datasets are reported, which show that this Counterfactual Augmentation method (CFA) generates useful synthetic data points in the minority class. The experiments also show that CFA is competitive with many other oversampling methods many of which are variants of SMOTE. The basis for CFAs performance is discussed, along with the conditions under which it is likely to perform better or worse in future tests.

Related papers

Synthetic Tabular Data Generation for Imbalanced Classification: The Surprising Effectiveness of an Overlap Class [20.606333546028516]
We show that state-of-the-art deep generative models yield significantly lower-quality minority examples than majority examples. We propose a novel technique of converting the binary class labels to ternary class labels by introducing a class for the region where minority and majority distributions overlap.
arXiv Detail & Related papers (2024-12-20T08:15:20Z)
Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers. We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes. We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z)
Intra-class Adaptive Augmentation with Neighbor Correction for Deep Metric Learning [99.14132861655223]
We propose a novel intra-class adaptive augmentation (IAA) framework for deep metric learning. We reasonably estimate intra-class variations for every class and generate adaptive synthetic samples to support hard samples mining. Our method significantly improves and outperforms the state-of-the-art methods on retrieval performances by 3%-6%.
arXiv Detail & Related papers (2022-11-29T14:52:38Z)
Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class. We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z)
HardVis: Visual Analytics to Handle Instance Hardness Using Undersampling and Oversampling Techniques [48.82319198853359]
HardVis is a visual analytics system designed to handle instance hardness mainly in imbalanced classification scenarios. Users can explore subsets of data from different perspectives to decide all those parameters. The efficacy and effectiveness of HardVis are demonstrated with a hypothetical usage scenario and a use case.
arXiv Detail & Related papers (2022-03-29T17:04:16Z)
Imbalanced Classification via Explicit Gradient Learning From Augmented Data [0.0]
We propose a novel deep meta-learning technique to augment a given imbalanced dataset with new minority instances. The advantage of the proposed method is demonstrated on synthetic and real-world datasets with various imbalance ratios.
arXiv Detail & Related papers (2022-02-21T22:16:50Z)
SMOTified-GAN for class imbalanced pattern classification problems [0.41998444721319217]
We propose a novel two-phase oversampling approach that has the synergy of SMOTE and GAN. The experimental results prove the sample quality of minority class(es) has been improved in a variety of tested benchmark datasets.
arXiv Detail & Related papers (2021-08-06T06:14:05Z)
GMOTE: Gaussian based minority oversampling technique for imbalanced classification adapting tail probability of outliers [0.0]
Data-level approaches mainly use the oversampling methods to solve the problem, such as synthetic minority oversampling Technique (SMOTE) In this paper, we proposed Gaussian based minority oversampling technique (GMOTE) with a statistical perspective for imbalanced datasets. When the GMOTE is combined with classification and regression tree (CART) or support vector machine (SVM), it shows better accuracy and F1-Score.
arXiv Detail & Related papers (2021-05-09T07:04:37Z)
Improving Calibration for Long-Tailed Recognition [68.32848696795519]
We propose two methods to improve calibration and performance in such scenarios. For dataset bias due to different samplers, we propose shifted batch normalization. Our proposed methods set new records on multiple popular long-tailed recognition benchmark datasets.
arXiv Detail & Related papers (2021-04-01T13:55:21Z)
A Synthetic Over-sampling method with Minority and Majority classes for imbalance problems [0.0]
We propose a new method to generate synthetic instances using Minority and Majority classes (SOMM) SOMM generates synthetic instances diversely within the minority data space. It updates the generated instances adaptively to the neighbourhood including both classes.
arXiv Detail & Related papers (2020-11-09T03:39:56Z)
Conditional Wasserstein GAN-based Oversampling of Tabular Data for Imbalanced Learning [10.051309746913512]
We propose an oversampling method based on a conditional Wasserstein GAN. We benchmark our method against standard oversampling methods and the imbalanced baseline on seven real-world datasets.
arXiv Detail & Related papers (2020-08-20T20:33:56Z)
M2m: Imbalanced Classification via Major-to-minor Translation [79.09018382489506]
In most real-world scenarios, labeled training datasets are highly class-imbalanced, where deep neural networks suffer from generalizing to a balanced testing criterion. In this paper, we explore a novel yet simple way to alleviate this issue by augmenting less-frequent classes via translating samples from more-frequent classes. Our experimental results on a variety of class-imbalanced datasets show that the proposed method improves the generalization on minority classes significantly compared to other existing re-sampling or re-weighting methods.
arXiv Detail & Related papers (2020-04-01T13:21:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.