Improved Algorithm for Deep Active Learning under Imbalance via Optimal Separation
- URL: http://arxiv.org/abs/2312.09196v4
- Date: Thu, 12 Jun 2025 07:17:19 GMT
- Title: Improved Algorithm for Deep Active Learning under Imbalance via Optimal Separation
- Authors: Shyam Nuggehalli, Jifan Zhang, Lalit Jain, Robert Nowak,
- Abstract summary: Class imbalance severely impacts machine learning performance on minority classes in real-world applications.<n>We introduce DIRECT, an algorithm that identifies class separation boundaries and selects the most uncertain nearby examples for annotation.<n>Our work presents the first comprehensive study of active learning under both class imbalance and label noise.
- Score: 15.571923343398657
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Class imbalance severely impacts machine learning performance on minority classes in real-world applications. While various solutions exist, active learning offers a fundamental fix by strategically collecting balanced, informative labeled examples from abundant unlabeled data. We introduce DIRECT, an algorithm that identifies class separation boundaries and selects the most uncertain nearby examples for annotation. By reducing the problem to one-dimensional active learning, DIRECT leverages established theory to handle batch labeling and label noise -- another common challenge in data annotation that particularly affects active learning methods. Our work presents the first comprehensive study of active learning under both class imbalance and label noise. Extensive experiments on imbalanced datasets show DIRECT reduces annotation costs by over 60\% compared to state-of-the-art active learning methods and over 80\% versus random sampling, while maintaining robustness to label noise.
Related papers
- Correcting Noisy Multilabel Predictions: Modeling Label Noise through Latent Space Shifts [4.795811957412855]
Noise in data appears to be inevitable in most real-world machine learning applications.
We investigate the less explored area of noisy label learning for multilabel classifications.
Our model posits that label noise arises from a shift in the latent variable, providing a more robust and beneficial means for noisy learning.
arXiv Detail & Related papers (2025-02-20T05:41:52Z) - Active Label Refinement for Robust Training of Imbalanced Medical Image Classification Tasks in the Presence of High Label Noise [10.232537737211098]
We propose a two-phase approach that combines Learning with Noisy Labels (LNL) and active learning.
We demonstrate that our proposed technique is superior to its predecessors at handling class imbalance by not misidentifying clean samples from minority classes as mostly noisy samples.
arXiv Detail & Related papers (2024-07-08T14:16:05Z) - Multiclass Learning from Noisy Labels for Non-decomposable Performance Measures [15.358504449550013]
We design algorithms to learn from noisy labels for two broad classes of non-decomposable performance measures.
In both cases, we develop noise-corrected versions of the algorithms under the widely studied class-conditional noise models.
Our experiments demonstrate the effectiveness of our algorithms in handling label noise.
arXiv Detail & Related papers (2024-02-01T23:03:53Z) - BAL: Balancing Diversity and Novelty for Active Learning [53.289700543331925]
We introduce a novel framework, Balancing Active Learning (BAL), which constructs adaptive sub-pools to balance diverse and uncertain data.
Our approach outperforms all established active learning methods on widely recognized benchmarks by 1.20%.
arXiv Detail & Related papers (2023-12-26T08:14:46Z) - Combating Label Noise With A General Surrogate Model For Sample Selection [77.45468386115306]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.<n>We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z) - Robust Assignment of Labels for Active Learning with Sparse and Noisy
Annotations [0.17188280334580192]
Supervised classification algorithms are used to solve a growing number of real-life problems around the globe.
Unfortunately, acquiring good-quality annotations for many tasks is infeasible or too expensive to be done in practice.
We propose two novel annotation unification algorithms that utilize unlabeled parts of the sample space.
arXiv Detail & Related papers (2023-07-25T19:40:41Z) - Unleashing the Potential of Regularization Strategies in Learning with
Noisy Labels [65.92994348757743]
We demonstrate that a simple baseline using cross-entropy loss, combined with widely used regularization strategies can outperform state-of-the-art methods.
Our findings suggest that employing a combination of regularization strategies can be more effective than intricate algorithms in tackling the challenges of learning with noisy labels.
arXiv Detail & Related papers (2023-07-11T05:58:20Z) - Co-Learning Meets Stitch-Up for Noisy Multi-label Visual Recognition [70.00984078351927]
This paper focuses on reducing noise based on some inherent properties of multi-label classification and long-tailed learning under noisy cases.
We propose a Stitch-Up augmentation to synthesize a cleaner sample, which directly reduces multi-label noise.
A Heterogeneous Co-Learning framework is further designed to leverage the inconsistency between long-tailed and balanced distributions.
arXiv Detail & Related papers (2023-07-03T09:20:28Z) - Learning with Noisy Labels through Learnable Weighting and Centroid Similarity [5.187216033152917]
noisy labels are prevalent in domains such as medical diagnosis and autonomous driving.
We introduce a novel method for training machine learning models in the presence of noisy labels.
Our results show that our method consistently outperforms the existing state-of-the-art techniques.
arXiv Detail & Related papers (2023-03-16T16:43:24Z) - PercentMatch: Percentile-based Dynamic Thresholding for Multi-Label
Semi-Supervised Classification [64.39761523935613]
We propose a percentile-based threshold adjusting scheme to dynamically alter the score thresholds of positive and negative pseudo-labels for each class during the training.
We achieve strong performance on Pascal VOC2007 and MS-COCO datasets when compared to recent SSL methods.
arXiv Detail & Related papers (2022-08-30T01:27:48Z) - L2B: Learning to Bootstrap Robust Models for Combating Label Noise [52.02335367411447]
This paper introduces a simple and effective method, named Learning to Bootstrap (L2B)
It enables models to bootstrap themselves using their own predictions without being adversely affected by erroneous pseudo-labels.
It achieves this by dynamically adjusting the importance weight between real observed and generated labels, as well as between different samples through meta-learning.
arXiv Detail & Related papers (2022-02-09T05:57:08Z) - Prototypical Classifier for Robust Class-Imbalanced Learning [64.96088324684683]
We propose textitPrototypical, which does not require fitting additional parameters given the embedding network.
Prototypical produces balanced and comparable predictions for all classes even though the training set is class-imbalanced.
We test our method on CIFAR-10LT, CIFAR-100LT and Webvision datasets, observing that Prototypical obtains substaintial improvements compared with state of the arts.
arXiv Detail & Related papers (2021-10-22T01:55:01Z) - Robust Long-Tailed Learning under Label Noise [50.00837134041317]
This work investigates the label noise problem under long-tailed label distribution.
We propose a robust framework,algo, that realizes noise detection for long-tailed learning.
Our framework can naturally leverage semi-supervised learning algorithms to further improve the generalisation.
arXiv Detail & Related papers (2021-08-26T03:45:00Z) - Learning From Long-Tailed Data With Noisy Labels [0.0]
Class imbalance and noisy labels are the norm in many large-scale classification datasets.
We present a simple two-stage approach based on recent advances in self-supervised learning.
We find that self-supervised learning approaches are effectively able to cope with severe class imbalance.
arXiv Detail & Related papers (2021-08-25T07:45:40Z) - Open-set Label Noise Can Improve Robustness Against Inherent Label Noise [27.885927200376386]
We show that open-set noisy labels can be non-toxic and even benefit the robustness against inherent noisy labels.
We propose a simple yet effective regularization by introducing Open-set samples with Dynamic Noisy Labels (ODNL) into training.
arXiv Detail & Related papers (2021-06-21T07:15:50Z) - A Second-Order Approach to Learning with Instance-Dependent Label Noise [58.555527517928596]
The presence of label noise often misleads the training of deep neural networks.
We show that the errors in human-annotated labels are more likely to be dependent on the difficulty levels of tasks.
arXiv Detail & Related papers (2020-12-22T06:36:58Z) - Efficient PAC Learning from the Crowd with Pairwise Comparison [7.594050968868919]
We study the problem of PAC learning threshold functions from the crowd, where the annotators can provide (noisy) labels or pairwise comparison tags.
We design a label-efficient algorithm that interleaves learning and annotation, which leads to a constant overhead of our algorithm.
arXiv Detail & Related papers (2020-11-02T16:37:55Z) - Active Learning under Label Shift [80.65643075952639]
We introduce a "medial distribution" to incorporate a tradeoff between importance and class-balanced sampling.
We prove sample complexity and generalization guarantees for Mediated Active Learning under Label Shift (MALLS)
We empirically demonstrate MALLS scales to high-dimensional datasets and can reduce the sample complexity of active learning by 60% in deep active learning tasks.
arXiv Detail & Related papers (2020-07-16T17:30:02Z) - A Graph-Based Approach for Active Learning in Regression [37.42533189350655]
Active learning aims to reduce labeling efforts by selectively asking humans to annotate the most important data points from an unlabeled pool.
Most existing active learning for regression methods use the regression function learned at each active learning iteration to select the next informative point to query.
We propose a feature-focused approach that formulates both sequential and batch-mode active regression as a novel bipartite graph optimization problem.
arXiv Detail & Related papers (2020-01-30T00:59:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.