Human-Corrected Labels Learning: Enhancing Labels Quality via Human Correction of VLMs Discrepancies
- URL: http://arxiv.org/abs/2511.09063v2
- Date: Fri, 14 Nov 2025 07:52:29 GMT
- Title: Human-Corrected Labels Learning: Enhancing Labels Quality via Human Correction of VLMs Discrepancies
- Authors: Zhongnian Li, Lan Chen, Yixin Xu, Shi Xu, Xinzheng Xu,
- Abstract summary: We propose Human-Corrected Labels (HCLs), a novel setting that efficient human correction for VLM-generated noisy labels.<n>HCLs deploys human correction only for instances with VLM discrepancies, achieving both higher-quality annotations and reduced labor costs.<n>Our approach achieves superior classification performance and is robust to label noise, validating the effectiveness of HCL in practical weak supervision scenarios.
- Score: 6.58446551781724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-Language Models (VLMs), with their powerful content generation capabilities, have been successfully applied to data annotation processes. However, the VLM-generated labels exhibit dual limitations: low quality (i.e., label noise) and absence of error correction mechanisms. To enhance label quality, we propose Human-Corrected Labels (HCLs), a novel setting that efficient human correction for VLM-generated noisy labels. As shown in Figure 1(b), HCL strategically deploys human correction only for instances with VLM discrepancies, achieving both higher-quality annotations and reduced labor costs. Specifically, we theoretically derive a risk-consistent estimator that incorporates both human-corrected labels and VLM predictions to train classifiers. Besides, we further propose a conditional probability method to estimate the label distribution using a combination of VLM outputs and model predictions. Extensive experiments demonstrate that our approach achieves superior classification performance and is robust to label noise, validating the effectiveness of HCL in practical weak supervision scenarios. Code https://github.com/Lilianach24/HCL.git
Related papers
- Calibratable Disambiguation Loss for Multi-Instance Partial-Label Learning [53.9713678229744]
Multi-instance partial-label learning (MIPL) is a weakly supervised framework that addresses the challenges of inexact supervision in both instance and label spaces.<n>Existing MIPL approaches often suffer from poor calibration, undermining reliability.<n>We propose a plug-and-play calibratable disambiguation loss (CDL) that simultaneously improves classification accuracy and calibration performance.
arXiv Detail & Related papers (2025-12-19T16:58:31Z) - Multi-Agent VLMs Guided Self-Training with PNU Loss for Low-Resource Offensive Content Detection [32.68131638705225]
We propose a self-training framework that leverages abundant unlabeled data through collaborative pseudo-labeling.<n>Our method iteratively assigns pseudo-labels to unlabeled instances with the support of Multi-Agent Vision-Language Models.<n>Experiments on benchmark datasets demonstrate that our framework substantially outperforms baselines under limited supervision.
arXiv Detail & Related papers (2025-11-14T08:03:35Z) - EVADE: LLM-Based Explanation Generation and Validation for Error Detection in NLI [36.91800117379075]
EVADE is a framework for generating and validating explanations to detect errors using large language models.<n>HLV arises when multiple labels are valid for the same instance, making it difficult to separate annotation errors from plausible variation.
arXiv Detail & Related papers (2025-11-12T03:49:05Z) - Handling Imbalanced Pseudolabels for Vision-Language Models with Concept Alignment and Confusion-Aware Calibrated Margin [56.37346003683629]
Adapting vision-language models (VLMs) to downstream tasks with pseudolabels has gained increasing attention.<n>A major obstacle is that the pseudolabels generated by VLMs tend to be imbalanced, leading to inferior performance.<n>We propose a novel framework incorporating concept alignment and confusion-aware margin mechanisms.
arXiv Detail & Related papers (2025-05-04T10:24:34Z) - Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance [28.524573212179124]
Large language models (LLMs) offer new opportunities to enhance the annotation process.<n>We compare expert, crowd-sourced, and LLM-based annotations in terms of the agreement, label quality, and efficiency.<n>Our findings reveal a substantial number of label errors, which, when corrected, a significant upward shift in reported model performance.
arXiv Detail & Related papers (2024-10-24T16:27:03Z) - Learning Label Refinement and Threshold Adjustment for Imbalanced Semi-Supervised Learning [6.904448748214652]
Semi-supervised learning algorithms struggle to perform well when exposed to imbalanced training data.
We introduce SEmi-supervised learning with pseudo-label optimization based on VALidation data (SEVAL)
SEVAL adapts to specific tasks with improved pseudo-labels accuracy and ensures pseudo-labels correctness on a per-class basis.
arXiv Detail & Related papers (2024-07-07T13:46:22Z) - Co-training for Low Resource Scientific Natural Language Inference [65.37685198688538]
We propose a novel co-training method that assigns weights based on the training dynamics of the classifiers to the distantly supervised labels.
By assigning importance weights instead of filtering out examples based on an arbitrary threshold on the predicted confidence, we maximize the usage of automatically labeled data.
The proposed method obtains an improvement of 1.5% in Macro F1 over the distant supervision baseline, and substantial improvements over several other strong SSL baselines.
arXiv Detail & Related papers (2024-06-20T18:35:47Z) - VLM-CPL: Consensus Pseudo Labels from Vision-Language Models for Annotation-Free Pathological Image Classification [16.05109192966549]
We present a novel human annotation-free method by leveraging pre-trained Vision-Language Models (VLMs)<n>We introduce VLM-CPL, a novel approach that contains two noisy label filtering techniques with a semi-supervised learning strategy.<n> Experimental results on five public pathological image datasets for patch-level and slide-level classification showed that our method substantially outperformed zero-shot classification by VLMs.
arXiv Detail & Related papers (2024-03-23T13:24:30Z) - CSOT: Curriculum and Structure-Aware Optimal Transport for Learning with
Noisy Labels [13.807759089431855]
Learning with noisy labels (LNL) poses a significant challenge in training a well-generalized model.
Recent advances have achieved impressive performance by identifying clean labels and corrupted labels for training.
We propose a novel optimal transport (OT) formulation, called Curriculum and Structure-aware Optimal Transport (CSOT)
arXiv Detail & Related papers (2023-12-11T09:12:50Z) - SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z) - Improved Adaptive Algorithm for Scalable Active Learning with Weak
Labeler [89.27610526884496]
Weak Labeler Active Cover (WL-AC) is able to robustly leverage the lower quality weak labelers to reduce the query complexity while retaining the desired level of accuracy.
We show its effectiveness on the corrupted-MNIST dataset by significantly reducing the number of labels while keeping the same accuracy as in passive learning.
arXiv Detail & Related papers (2022-11-04T02:52:54Z) - Label Noise-Resistant Mean Teaching for Weakly Supervised Fake News
Detection [93.6222609806278]
We propose a novel label noise-resistant mean teaching approach (LNMT) for weakly supervised fake news detection.
LNMT leverages unlabeled news and feedback comments of users to enlarge the amount of training data.
LNMT establishes a mean teacher framework equipped with label propagation and label reliability estimation.
arXiv Detail & Related papers (2022-06-10T16:01:58Z) - In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label
Selection Framework for Semi-Supervised Learning [53.1047775185362]
Pseudo-labeling (PL) is a general SSL approach that does not have this constraint but performs relatively poorly in its original formulation.
We argue that PL underperforms due to the erroneous high confidence predictions from poorly calibrated models.
We propose an uncertainty-aware pseudo-label selection (UPS) framework which improves pseudo labeling accuracy by drastically reducing the amount of noise encountered in the training process.
arXiv Detail & Related papers (2021-01-15T23:29:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.