Related papers: Bi-CoG: Bi-Consistency-Guided Self-Training for Vision-Language Models

Bi-CoG: Bi-Consistency-Guided Self-Training for Vision-Language Models

URL: http://arxiv.org/abs/2510.20477v1
Date: Thu, 23 Oct 2025 12:16:41 GMT
Title: Bi-CoG: Bi-Consistency-Guided Self-Training for Vision-Language Models
Authors: Rui Zhu, Song-Lin Lv, Zi-Kang Wang, Lan-Zhe Guo,
Abstract summary: We propose a plug-and-play methodology named $underlinetextbfBi-Co$nsistency-$underlinetextbfG$uided Self-Training.<n>Bi-CoG assigns high-quality and low-bias pseudo-labels, by simultaneously exploiting inter-model and intra-model consistency, along with an error-aware dynamic pseudo-label assignment strategy.
Score: 16.116493934368012
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Exploiting unlabeled data through semi-supervised learning (SSL) or leveraging pre-trained models via fine-tuning are two prevailing paradigms for addressing label-scarce scenarios. Recently, growing attention has been given to combining fine-tuning of pre-trained vision-language models (VLMs) with SSL, forming the emerging paradigm of semi-supervised fine-tuning. However, existing methods often suffer from model bias and hyperparameter sensitivity, due to reliance on prediction consistency or pre-defined confidence thresholds. To address these limitations, we propose a simple yet effective plug-and-play methodology named $\underline{\textbf{Bi-Co}}$nsistency-$\underline{\textbf{G}}$uided Self-Training (Bi-CoG), which assigns high-quality and low-bias pseudo-labels, by simultaneously exploiting inter-model and intra-model consistency, along with an error-aware dynamic pseudo-label assignment strategy. Both theoretical analysis and extensive experiments over 14 datasets demonstrate the effectiveness of Bi-CoG, which consistently and significantly improves the performance of existing methods.

Related papers

Fine-tuning Pre-trained Vision-Language Models in a Human-Annotation-Free Manner [46.140724013144194]
Large-scale vision-language models (VLMs) such as CLIP exhibit strong zero-shot generalization, but adapting them to downstream tasks typically requires costly labeled data.<n>Existing unsupervised self-training methods rely on pseudo-labeling, yet often suffer from unreliable confidence filtering, confirmation bias, and underutilization of low-confidence samples.<n>We propose Collaborative Fine-Tuning (CoFT), an unsupervised adaptation framework that leverages unlabeled data through a dual-model, cross-modal collaboration mechanism.
arXiv Detail & Related papers (2026-02-04T09:00:12Z)
Dual-level Modality Debiasing Learning for Unsupervised Visible-Infrared Person Re-Identification [59.59359638389348]
We propose a Dual-level Modality Debiasing Learning framework that implements debiasing at both the model and optimization levels.<n>Experiments on benchmark datasets demonstrate that DMDL could enable modality-invariant feature learning and a more generalized model.
arXiv Detail & Related papers (2025-12-03T12:43:16Z)
Prototype-Guided Pseudo-Labeling with Neighborhood-Aware Consistency for Unsupervised Adaptation [12.829638461740759]
In unsupervised adaptation for vision-language models such as CLIP, pseudo-labels from zero-shot predictions often exhibit significant noise.<n>We propose a novel adaptive pseudo-labeling framework that enhances CLIP's adaptation performance by integrating prototype consistency and neighborhood-based consistency.<n>Our method achieves state-of-the-art performance in unsupervised adaptation scenarios, delivering more accurate pseudo-labels while maintaining computational efficiency.
arXiv Detail & Related papers (2025-07-22T19:08:24Z)
Feedback-Driven Pseudo-Label Reliability Assessment: Redefining Thresholding for Semi-Supervised Semantic Segmentation [5.7977777220041204]
A common practice in pseudo-supervision is filtering pseudo-labels based on pre-defined confidence thresholds or entropy.<n>We propose Ensemble-of-Confidence Reinforcement (ENCORE), a dynamic feedback-driven thresholding strategy for pseudo-label selection.<n>Our method seamlessly integrates into existing pseudo-supervision frameworks and significantly improves segmentation performance.
arXiv Detail & Related papers (2025-05-12T15:58:08Z)
Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-Supervised Multi-Label Learning [81.83013974171364]
Semi-supervised multi-label learning (SSMLL) is a powerful framework for leveraging unlabeled data to reduce the expensive cost of collecting precise multi-label annotations.<n>Unlike semi-supervised learning, one cannot select the most probable label as the pseudo-label in SSMLL due to multiple semantics contained in an instance.<n>We propose a dual-perspective method to generate high-quality pseudo-labels.
arXiv Detail & Related papers (2024-07-26T09:33:53Z)
Robust Training of Federated Models with Extremely Label Deficiency [84.00832527512148]
Federated semi-supervised learning (FSSL) has emerged as a powerful paradigm for collaboratively training machine learning models using distributed data with label deficiency. We propose a novel twin-model paradigm, called Twin-sight, designed to enhance mutual guidance by providing insights from different perspectives of labeled and unlabeled data. Our comprehensive experiments on four benchmark datasets provide substantial evidence that Twin-sight can significantly outperform state-of-the-art methods across various experimental settings.
arXiv Detail & Related papers (2024-02-22T10:19:34Z)
Semi-Supervised Class-Agnostic Motion Prediction with Pseudo Label Regeneration and BEVMix [59.55173022987071]
We study the potential of semi-supervised learning for class-agnostic motion prediction. Our framework adopts a consistency-based self-training paradigm, enabling the model to learn from unlabeled data. Our method exhibits comparable performance to weakly and some fully supervised methods.
arXiv Detail & Related papers (2023-12-13T09:32:50Z)
Adaptive Weighted Co-Learning for Cross-Domain Few-Shot Learning [23.615250207134004]
Cross-domain few-shot learning (CDFSL) induces a very challenging adaptation problem. We propose a simple Adaptive Weighted Co-Learning (AWCoL) method to address the CDFSL challenge. Comprehensive experiments are conducted on multiple benchmark datasets and the empirical results demonstrate that the proposed method produces state-of-the-art CDFSL performance.
arXiv Detail & Related papers (2023-12-06T22:09:52Z)
Bi-Drop: Enhancing Fine-tuning Generalization via Synchronous sub-net Estimation and Optimization [58.90989478049686]
Bi-Drop is a fine-tuning strategy that selectively updates model parameters using gradients from various sub-nets. Experiments on the GLUE benchmark demonstrate that Bi-Drop consistently outperforms previous fine-tuning methods.
arXiv Detail & Related papers (2023-05-24T06:09:26Z)
TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks. We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework. TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z)
Adaptive Consistency Regularization for Semi-Supervised Transfer Learning [31.66745229673066]
We consider semi-supervised learning and transfer learning jointly, leading to a more practical and competitive paradigm. To better exploit the value of both pre-trained weights and unlabeled target examples, we introduce adaptive consistency regularization. Our proposed adaptive consistency regularization outperforms state-of-the-art semi-supervised learning techniques such as Pseudo Label, Mean Teacher, and MixMatch.
arXiv Detail & Related papers (2021-03-03T05:46:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.