A Hybrid Approach for Binary Classification of Imbalanced Data
- URL: http://arxiv.org/abs/2207.02738v2
- Date: Thu, 7 Jul 2022 13:09:06 GMT
- Title: A Hybrid Approach for Binary Classification of Imbalanced Data
- Authors: Hsin-Han Tsai, Ta-Wei Yang, Wai-Man Wong, and Cheng-Fu Chou
- Abstract summary: We propose HADR, a hybrid approach with dimension reduction that consists of data block construction, dimentionality reduction, and ensemble learning.
We evaluate the performance on eight imbalanced public datasets in terms of recall, G-mean, and AUC.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Binary classification with an imbalanced dataset is challenging. Models tend
to consider all samples as belonging to the majority class. Although existing
solutions such as sampling methods, cost-sensitive methods, and ensemble
learning methods improve the poor accuracy of the minority class, these methods
are limited by overfitting problems or cost parameters that are difficult to
decide. We propose HADR, a hybrid approach with dimension reduction that
consists of data block construction, dimentionality reduction, and ensemble
learning with deep neural network classifiers. We evaluate the performance on
eight imbalanced public datasets in terms of recall, G-mean, and AUC. The
results show that our model outperforms state-of-the-art methods.
Related papers
- Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Generalized Oversampling for Learning from Imbalanced datasets and
Associated Theory [0.0]
In supervised learning, it is quite frequent to be confronted with real imbalanced datasets.
We propose a data augmentation procedure, the GOLIATH algorithm, based on kernel density estimates.
We evaluate the performance of the GOLIATH algorithm in imbalanced regression situations.
arXiv Detail & Related papers (2023-08-05T23:08:08Z) - A review of ensemble learning and data augmentation models for class
imbalanced problems: combination, implementation and evaluation [0.196629787330046]
Class imbalance (CI) in classification problems arises when the number of observations belonging to one class is lower than the other.
In this paper, we evaluate data augmentation and ensemble learning methods used to address prominent benchmark CI problems.
arXiv Detail & Related papers (2023-04-06T04:37:10Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - Class-Imbalanced Complementary-Label Learning via Weighted Loss [8.934943507699131]
Complementary-label learning (CLL) is widely used in weakly supervised classification.
It faces a significant challenge in real-world datasets when confronted with class-imbalanced training samples.
We propose a novel problem setting that enables learning from class-imbalanced complementary labels for multi-class classification.
arXiv Detail & Related papers (2022-09-28T16:02:42Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - A One-shot Framework for Distributed Clustered Learning in Heterogeneous
Environments [54.172993875654015]
The paper proposes a family of communication efficient methods for distributed learning in heterogeneous environments.
One-shot approach, based on local computations at the users and a clustering based aggregation step at the server is shown to provide strong learning guarantees.
For strongly convex problems it is shown that, as long as the number of data points per user is above a threshold, the proposed approach achieves order-optimal mean-squared error rates in terms of the sample size.
arXiv Detail & Related papers (2022-09-22T09:04:10Z) - Effective Class-Imbalance learning based on SMOTE and Convolutional
Neural Networks [0.1074267520911262]
Imbalanced Data (ID) is a problem that deters Machine Learning (ML) models for achieving satisfactory results.
In this paper, we investigate the effectiveness of methods based on Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs)
In order to achieve reliable results, we conducted our experiments 100 times with randomly shuffled data distributions.
arXiv Detail & Related papers (2022-09-01T07:42:16Z) - Hybrid Ensemble optimized algorithm based on Genetic Programming for
imbalanced data classification [0.0]
We propose a hybrid ensemble algorithm based on Genetic Programming (GP) for two classes of imbalanced data classification.
Experimental results show the performance of the proposed method on the specified data sets in the size of the training set shows 40% and 50% better accuracy than other dimensions of the minority class prediction.
arXiv Detail & Related papers (2021-06-02T14:14:38Z) - Scalable Personalised Item Ranking through Parametric Density Estimation [53.44830012414444]
Learning from implicit feedback is challenging because of the difficult nature of the one-class problem.
Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem.
We propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart.
arXiv Detail & Related papers (2021-05-11T03:38:16Z) - Attentional-Biased Stochastic Gradient Descent [74.49926199036481]
We present a provable method (named ABSGD) for addressing the data imbalance or label noise problem in deep learning.
Our method is a simple modification to momentum SGD where we assign an individual importance weight to each sample in the mini-batch.
ABSGD is flexible enough to combine with other robust losses without any additional cost.
arXiv Detail & Related papers (2020-12-13T03:41:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.