Related papers: A Study imbalance handling by various data sampling methods in binary classification

Related papers

Deep Learning Meets Oversampling: A Learning Framework to Handle Imbalanced Classification [0.0]
We propose a novel learning framework that can generate synthetic data instances in a data-driven manner. The proposed framework formulates the oversampling process as a composition of discrete decision criteria. Experiments on the imbalanced classification task demonstrate the superiority of our framework over state-of-the-art algorithms.
arXiv Detail & Related papers (2025-02-08T13:35:00Z)
Statistical Undersampling with Mutual Information and Support Points [4.118796935183671]
Class imbalance and distributional differences in large datasets present significant challenges for classification tasks machine learning. This work introduces two novel undersampling approaches: mutual information-based stratified simple random sampling and support points optimization.
arXiv Detail & Related papers (2024-12-19T04:48:29Z)
Combining Denoising Autoencoders with Contrastive Learning to fine-tune Transformer Models [0.0]
This work proposes a 3 Phase technique to adjust a base model for a classification task. We adapt the model's signal to the data distribution by performing further training with a Denoising Autoencoder (DAE) In addition, we introduce a new data augmentation approach for Supervised Contrastive Learning to correct the unbalanced datasets.
arXiv Detail & Related papers (2024-05-23T11:08:35Z)
Distilled Datamodel with Reverse Gradient Matching [74.75248610868685]
We introduce an efficient framework for assessing data impact, comprising offline training and online evaluation stages. Our proposed method achieves comparable model behavior evaluation while significantly speeding up the process compared to the direct retraining method.
arXiv Detail & Related papers (2024-04-22T09:16:14Z)
On the Trade-off of Intra-/Inter-class Diversity for Supervised Pre-training [72.8087629914444]
We study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset. With the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity.
arXiv Detail & Related papers (2023-05-20T16:23:50Z)
Revisiting Long-tailed Image Classification: Survey and Benchmarks with New Evaluation Metrics [88.39382177059747]
A corpus of metrics is designed for measuring the accuracy, robustness, and bounds of algorithms for learning with long-tailed distribution. Based on our benchmarks, we re-evaluate the performance of existing methods on CIFAR10 and CIFAR100 datasets.
arXiv Detail & Related papers (2023-02-03T02:40:54Z)
Dynamic Loss For Robust Learning [17.33444812274523]
This work presents a novel meta-learning based dynamic loss that automatically adjusts the objective functions with the training process to robustly learn a classifier from long-tailed noisy data. Our method achieves state-of-the-art accuracy on multiple real-world and synthetic datasets with various types of data biases, including CIFAR-10/100, Animal-10N, ImageNet-LT, and Webvision.
arXiv Detail & Related papers (2022-11-22T01:48:25Z)
An Empirical Analysis of the Efficacy of Different Sampling Techniques for Imbalanced Classification [0.0]
The prevalence of imbalance in real-world datasets has led to the creation of a multitude of strategies for the class imbalance issue. Standard classification algorithms tend to perform poorly when trained on imbalanced data. We present a comprehensive analysis of 26 popular sampling techniques to understand their effectiveness in dealing with imbalanced data.
arXiv Detail & Related papers (2022-08-25T03:45:34Z)
Learning to Re-weight Examples with Optimal Transport for Imbalanced Classification [74.62203971625173]
Imbalanced data pose challenges for deep learning based classification models. One of the most widely-used approaches for tackling imbalanced data is re-weighting. We propose a novel re-weighting method based on optimal transport (OT) from a distributional point of view.
arXiv Detail & Related papers (2022-08-05T01:23:54Z)
Study of sampling methods in sentiment analysis of imbalanced data [0.0]
This work investigates the application of sampling methods for sentiment analysis on two different datasets. One dataset contains online user reviews from the cooking platform Epicurious and the other contains comments given to the Planned Parenthood organization.
arXiv Detail & Related papers (2021-06-12T03:16:18Z)
Semi-supervised Long-tailed Recognition using Alternate Sampling [95.93760490301395]
Main challenges in long-tailed recognition come from the imbalanced data distribution and sample scarcity in its tail classes. We propose a new recognition setting, namely semi-supervised long-tailed recognition. We demonstrate significant accuracy improvements over other competitive methods on two datasets.
arXiv Detail & Related papers (2021-05-01T00:43:38Z)
Handling Imbalanced Data: A Case Study for Binary Class Problems [0.0]
The major issues in terms of solving for classification problems are the issues of Imbalanced data. This paper focuses on both synthetic oversampling techniques and manually computes synthetic data points to enhance easy comprehension of the algorithms. We analyze the application of these synthetic oversampling techniques on binary classification problems with different Imbalanced ratios and sample sizes.
arXiv Detail & Related papers (2020-10-09T02:04:14Z)
Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data. There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups. We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.