Balanced Split: A new train-test data splitting strategy for imbalanced
datasets
- URL: http://arxiv.org/abs/2212.11116v1
- Date: Sat, 17 Dec 2022 10:36:39 GMT
- Title: Balanced Split: A new train-test data splitting strategy for imbalanced
datasets
- Authors: Azal Ahmad Khan
- Abstract summary: Class imbalance is a problem since most machine learning algorithms are built with an assumption of equal representation of all classes in the training dataset.
This paper shows a new way to counter the class imbalance problem through a new data-splitting strategy called balanced split.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Classification data sets with skewed class proportions are called imbalanced.
Class imbalance is a problem since most machine learning classification
algorithms are built with an assumption of equal representation of all classes
in the training dataset. Therefore to counter the class imbalance problem, many
algorithm-level and data-level approaches have been developed. These mainly
include ensemble learning and data augmentation techniques. This paper shows a
new way to counter the class imbalance problem through a new data-splitting
strategy called balanced split. Data splitting can play an important role in
correctly classifying imbalanced datasets. We show that the commonly used
data-splitting strategies have some disadvantages, and our proposed balanced
split has solved those problems.
Related papers
- Simplifying Neural Network Training Under Class Imbalance [77.39968702907817]
Real-world datasets are often highly class-imbalanced, which can adversely impact the performance of deep learning models.
The majority of research on training neural networks under class imbalance has focused on specialized loss functions, sampling techniques, or two-stage training procedures.
We demonstrate that simply tuning existing components of standard deep learning pipelines, such as the batch size, data augmentation, and label smoothing, can achieve state-of-the-art performance without any such specialized class imbalance methods.
arXiv Detail & Related papers (2023-12-05T05:52:44Z) - A Survey of Methods for Handling Disk Data Imbalance [10.261915886145214]
This paper provides a comprehensive overview of research in the field of imbalanced data classification.
The Backblaze dataset, a widely used dataset related to hard discs, has a small amount of failure data and a large amount of health data, which exhibits a serious class imbalance.
arXiv Detail & Related papers (2023-10-13T05:35:13Z) - Addressing Class Variable Imbalance in Federated Semi-supervised
Learning [10.542178602467885]
We propose Federated Semi-supervised Learning for Class Variable Imbalance (FCVI) to solve class variable imbalance.
FCVI is used to mitigate the data imbalance due to changes of the number of classes.
Our scheme is proved to be significantly better than baseline methods, while maintaining client privacy.
arXiv Detail & Related papers (2023-03-21T12:50:17Z) - Revisiting Long-tailed Image Classification: Survey and Benchmarks with
New Evaluation Metrics [88.39382177059747]
A corpus of metrics is designed for measuring the accuracy, robustness, and bounds of algorithms for learning with long-tailed distribution.
Based on our benchmarks, we re-evaluate the performance of existing methods on CIFAR10 and CIFAR100 datasets.
arXiv Detail & Related papers (2023-02-03T02:40:54Z) - An Embarrassingly Simple Baseline for Imbalanced Semi-Supervised
Learning [103.65758569417702]
Semi-supervised learning (SSL) has shown great promise in leveraging unlabeled data to improve model performance.
We consider a more realistic and challenging setting called imbalanced SSL, where imbalanced class distributions occur in both labeled and unlabeled data.
We study a simple yet overlooked baseline -- SimiS -- which tackles data imbalance by simply supplementing labeled data with pseudo-labels.
arXiv Detail & Related papers (2022-11-20T21:18:41Z) - Phased Progressive Learning with Coupling-Regulation-Imbalance Loss for
Imbalanced Classification [11.673344551762822]
Deep neural networks generally perform poorly with datasets that suffer from quantity imbalance and classification difficulty imbalance between different classes.
A phased progressive learning schedule was proposed for smoothly transferring the training emphasis from representation learning to upper classifier training.
Our code will be open source soon.
arXiv Detail & Related papers (2022-05-24T14:46:39Z) - Deep Reinforcement Learning for Multi-class Imbalanced Training [64.9100301614621]
We introduce an imbalanced classification framework, based on reinforcement learning, for training extremely imbalanced data sets.
We formulate a custom reward function and episode-training procedure, specifically with the added capability of handling multi-class imbalanced training.
Using real-world clinical case studies, we demonstrate that our proposed framework outperforms current state-of-the-art imbalanced learning methods.
arXiv Detail & Related papers (2022-05-24T13:39:59Z) - ABC: Auxiliary Balanced Classifier for Class-imbalanced Semi-supervised
Learning [6.866717993664787]
Existing semi-supervised learning (SSL) algorithms assume class-balanced datasets.
We propose a scalable class-imbalanced SSL algorithm that can effectively use unlabeled data.
The proposed algorithm achieves state-of-the-art performance in various class-imbalanced SSL experiments using four benchmark datasets.
arXiv Detail & Related papers (2021-10-20T04:07:48Z) - PLM: Partial Label Masking for Imbalanced Multi-label Classification [59.68444804243782]
Neural networks trained on real-world datasets with long-tailed label distributions are biased towards frequent classes and perform poorly on infrequent classes.
We propose a method, Partial Label Masking (PLM), which utilizes this ratio during training.
Our method achieves strong performance when compared to existing methods on both multi-label (MultiMNIST and MSCOCO) and single-label (imbalanced CIFAR-10 and CIFAR-100) image classification datasets.
arXiv Detail & Related papers (2021-05-22T18:07:56Z) - SetConv: A New Approach for Learning from Imbalanced Data [29.366843553056594]
We propose a set convolution operation and an episodic training strategy to extract a single representative for each class.
We prove that our proposed algorithm is permutation-invariant despite the order of inputs.
arXiv Detail & Related papers (2021-04-03T22:33:30Z) - Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers.
Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
arXiv Detail & Related papers (2020-04-07T20:57:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.