Tackling Diverse Minorities in Imbalanced Classification
- URL: http://arxiv.org/abs/2308.14838v1
- Date: Mon, 28 Aug 2023 18:48:34 GMT
- Title: Tackling Diverse Minorities in Imbalanced Classification
- Authors: Kwei-Herng Lai, Daochen Zha, Huiyuan Chen, Mangesh Bendre, Yuzhong
Chen, Mahashweta Das, Hao Yang, Xia Hu
- Abstract summary: Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
- Score: 80.78227787608714
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imbalanced datasets are commonly observed in various real-world applications,
presenting significant challenges in training classifiers. When working with
large datasets, the imbalanced issue can be further exacerbated, making it
exceptionally difficult to train classifiers effectively. To address the
problem, over-sampling techniques have been developed to linearly interpolating
data instances between minorities and their neighbors. However, in many
real-world scenarios such as anomaly detection, minority instances are often
dispersed diversely in the feature space rather than clustered together.
Inspired by domain-agnostic data mix-up, we propose generating synthetic
samples iteratively by mixing data samples from both minority and majority
classes. It is non-trivial to develop such a framework, the challenges include
source sample selection, mix-up strategy selection, and the coordination
between the underlying model and mix-up strategies. To tackle these challenges,
we formulate the problem of iterative data mix-up as a Markov decision process
(MDP) that maps data attributes onto an augmentation strategy. To solve the
MDP, we employ an actor-critic framework to adapt the discrete-continuous
decision space. This framework is utilized to train a data augmentation policy
and design a reward signal that explores classifier uncertainty and encourages
performance improvement, irrespective of the classifier's convergence. We
demonstrate the effectiveness of our proposed framework through extensive
experiments conducted on seven publicly available benchmark datasets using
three different types of classifiers. The results of these experiments showcase
the potential and promise of our framework in addressing imbalanced datasets
with diverse minorities.
Related papers
- FedMAC: Tackling Partial-Modality Missing in Federated Learning with Cross-Modal Aggregation and Contrastive Regularization [11.954904313477176]
Federated Learning (FL) is a method for training machine learning models using distributed data sources.
This study proposes a novel framework named FedMAC, designed to address multi-modality missing under conditions of partial-modality missing in FL.
arXiv Detail & Related papers (2024-10-04T01:24:02Z) - Confronting Discrimination in Classification: Smote Based on
Marginalized Minorities in the Kernel Space for Imbalanced Data [0.0]
We propose a novel classification oversampling approach based on the decision boundary and sample proximity relationships.
We test the proposed method on a classic financial fraud dataset.
arXiv Detail & Related papers (2024-02-13T04:03:09Z) - Generative Oversampling for Imbalanced Data via Majority-Guided VAE [15.93867386081279]
We propose a novel over-sampling model, called Majority-Guided VAE(MGVAE), which generates new minority samples under the guidance of a majority-based prior.
In this way, the newly generated minority samples can inherit the diversity and richness of the majority ones, thus mitigating overfitting in downstream tasks.
arXiv Detail & Related papers (2023-02-14T06:35:23Z) - Effective Class-Imbalance learning based on SMOTE and Convolutional
Neural Networks [0.1074267520911262]
Imbalanced Data (ID) is a problem that deters Machine Learning (ML) models for achieving satisfactory results.
In this paper, we investigate the effectiveness of methods based on Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs)
In order to achieve reliable results, we conducted our experiments 100 times with randomly shuffled data distributions.
arXiv Detail & Related papers (2022-09-01T07:42:16Z) - DRFLM: Distributionally Robust Federated Learning with Inter-client
Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data.
We propose a general framework to solve the above two challenges simultaneously.
We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z) - Equivariance Allows Handling Multiple Nuisance Variables When Analyzing
Pooled Neuroimaging Datasets [53.34152466646884]
In this paper, we show how bringing recent results on equivariant representation learning instantiated on structured spaces together with simple use of classical results on causal inference provides an effective practical solution.
We demonstrate how our model allows dealing with more than one nuisance variable under some assumptions and can enable analysis of pooled scientific datasets in scenarios that would otherwise entail removing a large portion of the samples.
arXiv Detail & Related papers (2022-03-29T04:54:06Z) - Selecting the suitable resampling strategy for imbalanced data
classification regarding dataset properties [62.997667081978825]
In many application domains such as medicine, information retrieval, cybersecurity, social media, etc., datasets used for inducing classification models often have an unequal distribution of the instances of each class.
This situation, known as imbalanced data classification, causes low predictive performance for the minority class examples.
Oversampling and undersampling techniques are well-known strategies to deal with this problem by balancing the number of examples of each class.
arXiv Detail & Related papers (2021-12-15T18:56:39Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Imbalanced Data Learning by Minority Class Augmentation using Capsule
Adversarial Networks [31.073558420480964]
We propose a method to restore the balance in imbalanced images, by coalescing two concurrent methods.
In our model, generative and discriminative networks play a novel competitive game.
The coalescing of capsule-GAN is effective at recognizing highly overlapping classes with much fewer parameters compared with the convolutional-GAN.
arXiv Detail & Related papers (2020-04-05T12:36:06Z) - When Relation Networks meet GANs: Relation GANs with Triplet Loss [110.7572918636599]
Training stability is still a lingering concern of generative adversarial networks (GANs)
In this paper, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability.
Experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks.
arXiv Detail & Related papers (2020-02-24T11:35:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.