Minority Class Oriented Active Learning for Imbalanced Datasets
- URL: http://arxiv.org/abs/2202.00390v1
- Date: Tue, 1 Feb 2022 13:13:41 GMT
- Title: Minority Class Oriented Active Learning for Imbalanced Datasets
- Authors: Umang Aggarwal, Adrian Popescu, and C\'eline Hudelot
- Abstract summary: We introduce a new active learning method which is designed for imbalanced datasets.
It favors samples likely to be in minority classes so as to reduce the imbalance of the labeled subset.
We also compare two training schemes for active learning.
- Score: 6.009262446889319
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Active learning aims to optimize the dataset annotation process when
resources are constrained. Most existing methods are designed for balanced
datasets. Their practical applicability is limited by the fact that a majority
of real-life datasets are actually imbalanced. Here, we introduce a new active
learning method which is designed for imbalanced datasets. It favors samples
likely to be in minority classes so as to reduce the imbalance of the labeled
subset and create a better representation for these classes. We also compare
two training schemes for active learning: (1) the one commonly deployed in deep
active learning using model fine tuning for each iteration and (2) a scheme
which is inspired by transfer learning and exploits generic pre-trained models
and train shallow classifiers for each iteration. Evaluation is run with three
imbalanced datasets. Results show that the proposed active learning method
outperforms competitive baselines. Equally interesting, they also indicate that
the transfer learning training scheme outperforms model fine tuning if features
are transferable from the generic dataset to the unlabeled one. This last
result is surprising and should encourage the community to explore the design
of deep active learning methods.
Related papers
- Simplifying Neural Network Training Under Class Imbalance [77.39968702907817]
Real-world datasets are often highly class-imbalanced, which can adversely impact the performance of deep learning models.
The majority of research on training neural networks under class imbalance has focused on specialized loss functions, sampling techniques, or two-stage training procedures.
We demonstrate that simply tuning existing components of standard deep learning pipelines, such as the batch size, data augmentation, and label smoothing, can achieve state-of-the-art performance without any such specialized class imbalance methods.
arXiv Detail & Related papers (2023-12-05T05:52:44Z) - One-bit Supervision for Image Classification: Problem, Solution, and
Beyond [114.95815360508395]
This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification.
We propose a multi-stage training paradigm and incorporate negative label suppression into an off-the-shelf semi-supervised learning algorithm.
In multiple benchmarks, the learning efficiency of the proposed approach surpasses that using full-bit, semi-supervised supervision.
arXiv Detail & Related papers (2023-11-26T07:39:00Z) - Active Learning with Combinatorial Coverage [0.0]
Active learning is a practical field of machine learning that automates the process of selecting which data to label.
Current methods are effective in reducing the burden of data labeling but are heavily model-reliant.
This has led to the inability of sampled data to be transferred to new models as well as issues with sampling bias.
We propose active learning methods utilizing coverage to overcome these issues.
arXiv Detail & Related papers (2023-02-28T13:43:23Z) - Iterative Loop Learning Combining Self-Training and Active Learning for
Domain Adaptive Semantic Segmentation [1.827510863075184]
Self-training and active learning have been proposed to alleviate this problem.
This paper proposes an iterative loop learning method combining Self-Training and Active Learning.
arXiv Detail & Related papers (2023-01-31T01:31:43Z) - Imbalanced Classification via Explicit Gradient Learning From Augmented
Data [0.0]
We propose a novel deep meta-learning technique to augment a given imbalanced dataset with new minority instances.
The advantage of the proposed method is demonstrated on synthetic and real-world datasets with various imbalance ratios.
arXiv Detail & Related papers (2022-02-21T22:16:50Z) - CMW-Net: Learning a Class-Aware Sample Weighting Mapping for Robust Deep
Learning [55.733193075728096]
Modern deep neural networks can easily overfit to biased training data containing corrupted labels or class imbalance.
Sample re-weighting methods are popularly used to alleviate this data bias issue.
We propose a meta-model capable of adaptively learning an explicit weighting scheme directly from data.
arXiv Detail & Related papers (2022-02-11T13:49:51Z) - Prototypical Classifier for Robust Class-Imbalanced Learning [64.96088324684683]
We propose textitPrototypical, which does not require fitting additional parameters given the embedding network.
Prototypical produces balanced and comparable predictions for all classes even though the training set is class-imbalanced.
We test our method on CIFAR-10LT, CIFAR-100LT and Webvision datasets, observing that Prototypical obtains substaintial improvements compared with state of the arts.
arXiv Detail & Related papers (2021-10-22T01:55:01Z) - Class-Balanced Active Learning for Image Classification [29.5211685759702]
We propose a general optimization framework that explicitly takes class-balancing into account.
Results on three datasets showed that the method is general (it can be combined with most existing active learning algorithms) and can be effectively applied to boost the performance of both informative and representative-based active learning methods.
arXiv Detail & Related papers (2021-10-09T11:30:26Z) - Class Balancing GAN with a Classifier in the Loop [58.29090045399214]
We introduce a novel theoretically motivated Class Balancing regularizer for training GANs.
Our regularizer makes use of the knowledge from a pre-trained classifier to ensure balanced learning of all the classes in the dataset.
We demonstrate the utility of our regularizer in learning representations for long-tailed distributions via achieving better performance than existing approaches over multiple datasets.
arXiv Detail & Related papers (2021-06-17T11:41:30Z) - Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z) - Sequential Targeting: an incremental learning approach for data
imbalance in text classification [7.455546102930911]
Methods to handle imbalanced datasets are crucial for alleviating distributional skews.
We propose a novel training method, Sequential Targeting(ST), independent of the effectiveness of the representation method.
We demonstrate the effectiveness of our method through experiments on simulated benchmark datasets (IMDB) and data collected from NAVER.
arXiv Detail & Related papers (2020-11-20T04:54:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.