Related papers: Reliable Active Learning from Unreliable Labels via Neural Collapse Geometry

Reliable Active Learning from Unreliable Labels via Neural Collapse Geometry

URL: http://arxiv.org/abs/2510.09740v1
Date: Fri, 10 Oct 2025 17:50:31 GMT
Title: Reliable Active Learning from Unreliable Labels via Neural Collapse Geometry
Authors: Atharv Goel, Sharat Agarwal, Saket Anand, Chetan Arora,
Abstract summary: Active Learning (AL) promises to reduce annotation cost by prioritizing informative samples, yet its reliability is undermined when labels are noisy or when the data distribution shifts.<n>We propose Active Learning via Neural Collapse Geometry (NCAL-R), a framework that leverages the emergent geometric regularities of deep networks to counteract unreliable supervision.
Score: 5.1511135538176
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Active Learning (AL) promises to reduce annotation cost by prioritizing informative samples, yet its reliability is undermined when labels are noisy or when the data distribution shifts. In practice, annotators make mistakes, rare categories are ambiguous, and conventional AL heuristics (uncertainty, diversity) often amplify such errors by repeatedly selecting mislabeled or redundant samples. We propose Reliable Active Learning via Neural Collapse Geometry (NCAL-R), a framework that leverages the emergent geometric regularities of deep networks to counteract unreliable supervision. Our method introduces two complementary signals: (i) a Class-Mean Alignment Perturbation score, which quantifies how candidate samples structurally stabilize or distort inter-class geometry, and (ii) a Feature Fluctuation score, which captures temporal instability of representations across training checkpoints. By combining these signals, NCAL-R prioritizes samples that both preserve class separation and highlight ambiguous regions, mitigating the effect of noisy or redundant labels. Experiments on ImageNet-100 and CIFAR100 show that NCAL-R consistently outperforms standard AL baselines, achieving higher accuracy with fewer labels, improved robustness under synthetic label noise, and stronger generalization to out-of-distribution data. These results suggest that incorporating geometric reliability criteria into acquisition decisions can make Active Learning less brittle to annotation errors and distribution shifts, a key step toward trustworthy deployment in real-world labeling pipelines. Our code is available at https://github.com/Vision-IIITD/NCAL.

Related papers

Sharpness-aware Dynamic Anchor Selection for Generalized Category Discovery [61.694524826522205]
Given some labeled data of known classes, GCD aims to cluster unlabeled data that contain both known and unknown classes.<n>Large pre-trained models have a preference for some specific visual patterns, resulting in encoding spurious correlation for unlabeled data.<n>We propose a novel method, which contains two modules: Loss Sharpness Penalty (LSP) and Dynamic Anchor Selection (DAS)
arXiv Detail & Related papers (2025-12-15T02:24:06Z)
Volatility in Certainty (VC): A Metric for Detecting Adversarial Perturbations During Inference in Neural Network Classifiers [0.5793804025420254]
Adversarial robustness remains a critical challenge in deploying neural network classifiers.<n>This paper investigates textitVolatility in Certainty (VC), a label-free metric that quantifies irregularities in model confidence.
arXiv Detail & Related papers (2025-11-14T19:51:04Z)
Semi-Supervised Regression with Heteroscedastic Pseudo-Labels [50.54050677867914]
We propose an uncertainty-aware pseudo-labeling framework that dynamically adjusts pseudo-label influence from a bi-level optimization perspective.<n>We provide theoretical insights and extensive experiments to validate our approach across various benchmark SSR datasets.
arXiv Detail & Related papers (2025-10-17T03:06:23Z)
Building Blocks for Robust and Effective Semi-Supervised Real-World Object Detection [1.188383832081829]
Semi-supervised object detection (SSOD) based on pseudo-labeling significantly reduces dependence on large labeled datasets.<n>However, real-world applications of SSOD often face critical challenges, including class imbalance, label noise, and labeling errors.<n>We present an in-depth analysis of SSOD under real-world conditions, uncovering causes of suboptimal pseudo-labeling and key trade-offs between label quality and quantity.
arXiv Detail & Related papers (2025-03-24T17:15:24Z)
Mitigating Instance-Dependent Label Noise: Integrating Self-Supervised Pretraining with Pseudo-Label Refinement [3.272177633069322]
Real-world datasets often contain noisy labels due to human error, ambiguity, or resource constraints during the annotation process.<n>We propose a novel framework that combines self-supervised learning using SimCLR with iterative pseudo-label refinement.<n>Our approach significantly outperforms several state-of-the-art methods, particularly under high noise conditions.
arXiv Detail & Related papers (2024-12-06T09:56:49Z)
Typicalness-Aware Learning for Failure Detection [26.23185979968123]
Deep neural networks (DNNs) often suffer from the overconfidence issue, where incorrect predictions are made with high confidence scores. We propose a novel approach called Typicalness-Aware Learning (TAL) to address this issue and improve failure detection performance.
arXiv Detail & Related papers (2024-11-04T11:09:47Z)
Trusted Multi-view Learning under Noisy Supervision [20.668620759102115]
We propose a method to develop a reliable multi-view learning model under the guidance of noisy labels.<n>TMNR employs evidential deep neural networks to construct view-specific opinions that capture both beliefs and uncertainty.<n>TMNR2 identifies potentially mislabeled samples through evidence-label consistency and generates pseudo-labels from neighboring information.
arXiv Detail & Related papers (2024-04-18T06:47:30Z)
Decoupled Prototype Learning for Reliable Test-Time Adaptation [50.779896759106784]
Test-time adaptation (TTA) is a task that continually adapts a pre-trained source model to the target domain during inference. One popular approach involves fine-tuning model with cross-entropy loss according to estimated pseudo-labels. This study reveals that minimizing the classification error of each sample causes the cross-entropy loss's vulnerability to label noise. We propose a novel Decoupled Prototype Learning (DPL) method that features prototype-centric loss computation.
arXiv Detail & Related papers (2024-01-15T03:33:39Z)
ERASE: Error-Resilient Representation Learning on Graphs for Label Noise Tolerance [53.73316938815873]
We propose a method called ERASE (Error-Resilient representation learning on graphs for lAbel noiSe tolerancE) to learn representations with error tolerance. ERASE combines prototype pseudo-labels with propagated denoised labels and updates representations with error resilience. Our method can outperform multiple baselines with clear margins in broad noise levels and enjoy great scalability.
arXiv Detail & Related papers (2023-12-13T17:59:07Z)
All Points Matter: Entropy-Regularized Distribution Alignment for Weakly-supervised 3D Segmentation [67.30502812804271]
Pseudo-labels are widely employed in weakly supervised 3D segmentation tasks where only sparse ground-truth labels are available for learning. We propose a novel learning strategy to regularize the generated pseudo-labels and effectively narrow the gaps between pseudo-labels and model predictions.
arXiv Detail & Related papers (2023-05-25T08:19:31Z)
S3: Supervised Self-supervised Learning under Label Noise [53.02249460567745]
In this paper we address the problem of classification in the presence of label noise. In the heart of our method is a sample selection mechanism that relies on the consistency between the annotated label of a sample and the distribution of the labels in its neighborhood in the feature space. Our method significantly surpasses previous methods on both CIFARCIFAR100 with artificial noise and real-world noisy datasets such as WebVision and ANIMAL-10N.
arXiv Detail & Related papers (2021-11-22T15:49:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.