Robust Prompt Tuning for Vision-Language Models with Mild Semantic Noise
- URL: http://arxiv.org/abs/2508.04677v3
- Date: Thu, 02 Oct 2025 14:03:47 GMT
- Title: Robust Prompt Tuning for Vision-Language Models with Mild Semantic Noise
- Authors: Yansheng Gao, Yufei Zheng, Shengsheng Wang,
- Abstract summary: We propose ANPrompt, a robust prompt tuning framework that actively incorporates weak semantic noise.<n>We show that ANPrompt consistently outperforms existing prompt tuning methods.<n>It offers superior robustness to semantic noise and improved generalization across tasks.
- Score: 9.536089523962486
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prompt tuning has shown promising results, but its robustness and generalization to unseen categories remain limited. Through our experiments, we demonstrate that the complete removal of semantic noise is a key factor restricting robustness. Existing methods typically suppress or filter out semantic noise in the prompt space, inadvertently hindering the model's robustness and its ability to generalize to unseen categories. To address this, we propose ANPrompt, a robust prompt tuning framework that actively incorporates weak semantic noise. By clustering weakly perturbed features into noise prompts and integrating them with learnable tokens in both the text and vision encoders, ANPrompt ensures controlled exposure to semantic variations. To enhance the visual pathway, we introduce the Noise-Resistant Visual Prompt Prototype (NRVPP), which stabilizes visual semantics under weak perturbations. Additionally, we propose a Weak Alignment Loss (WALoss) at the logits level to enforce consistency between clean and perturbed predictions, providing stable supervision. By combining weak semantic noise exposure with logits-based consistency, ANPrompt prevents overfitting to specific phrasings while preserving semantic integrity. Extensive experiments across 11 benchmarks, including base-to-new splits, show that ANPrompt consistently outperforms existing prompt tuning methods, offering superior robustness to semantic noise and improved generalization across tasks.
Related papers
- Noise-Adaptive Regularization for Robust Multi-Label Remote Sensing Image Classification [5.658568324275769]
We propose NAR, a noise-adaptive regularization method that distinguishes between additive and subtractive noise.<n> NAR consistently improves robustness compared with existing methods.<n>Performance improvements are most pronounced under subtractive and mixed noise.
arXiv Detail & Related papers (2026-01-13T11:16:45Z) - Lost in the Noise: How Reasoning Models Fail with Contextual Distractors [57.31788955167306]
Recent advances in reasoning models and agentic AI systems have led to an increased reliance on diverse external information.<n>We introduce NoisyBench, a comprehensive benchmark that systematically evaluates model robustness across 11 datasets in RAG, reasoning, alignment, and tool-use tasks.<n>Our evaluation reveals a catastrophic performance drop of up to 80% in state-of-the-art models when faced with contextual distractors.
arXiv Detail & Related papers (2026-01-12T05:43:51Z) - Explainable Disentanglement on Discrete Speech Representations for Noise-Robust ASR [37.09163295946173]
We propose disentangling semantic speech content from background noise in the latent space.<n>Our end-to-end model separates clean speech in the form of codebook tokens, while extracting interpretable noise vectors.<n>We show that our approach improves alignment between clean/noisy speech and text, producing speech tokens that display a high degree of noiseinvariance.
arXiv Detail & Related papers (2025-10-29T04:08:19Z) - Mitigating the Noise Shift for Denoising Generative Models via Noise Awareness Guidance [54.88271057438763]
Noise Awareness Guidance (NAG) is a correction method that explicitly steers sampling trajectories to remain consistent with the pre-defined noise schedule.<n>NAG consistently mitigates noise shift and substantially improves the generation quality of mainstream diffusion models.
arXiv Detail & Related papers (2025-10-14T13:31:34Z) - StableToken: A Noise-Robust Semantic Speech Tokenizer for Resilient SpeechLLMs [54.229363096087866]
Speech tokenizers are not robust to meaning-irrelevant acoustic perturbations.<n>This instability stems from two flaws: a brittle single-path quantization architecture and a distant training signal.<n>We introduce StableToken, a tokenizer that achieves stability through a consensus-driven mechanism.
arXiv Detail & Related papers (2025-09-26T11:32:51Z) - OptiPrune: Boosting Prompt-Image Consistency with Attention-Guided Noise and Dynamic Token Selection [0.0]
We propose a unified framework that combines distribution-aware initial noise optimization with similarity-based token pruning.<n>Experiments on benchmark datasets, including Animal-Animal, demonstrate that OptiPrune achieves state-of-the-art prompt-image consistency with significantly reduced computational cost.
arXiv Detail & Related papers (2025-07-01T14:24:40Z) - Noise Augmented Fine Tuning for Mitigating Hallucinations in Large Language Models [1.0579965347526206]
Large language models (LLMs) often produce inaccurate or misleading content-hallucinations.<n>Noise-Augmented Fine-Tuning (NoiseFiT) is a novel framework that leverages adaptive noise injection to enhance model robustness.<n>NoiseFiT selectively perturbs layers identified as either high-SNR (more robust) or low-SNR (potentially under-regularized) using a dynamically scaled Gaussian noise.
arXiv Detail & Related papers (2025-04-04T09:27:19Z) - The Silent Assistant: NoiseQuery as Implicit Guidance for Goal-Driven Image Generation [31.599902235859687]
We propose to leverage an aligned Gaussian noise as implicit guidance to complement explicit user-defined inputs, such as text prompts.<n>NoiseQuery enables fine-grained control and yields significant performance boosts over high-level semantics and over low-level visual attributes.
arXiv Detail & Related papers (2024-12-06T14:59:00Z) - NLPrompt: Noise-Label Prompt Learning for Vision-Language Models [45.35555264802591]
Real-world datasets often contain noisy labels that can degrade prompt learning performance.<n>In this paper, we demonstrate that using mean absolute error (MAE) loss in prompt learning, named PromptMAE, significantly enhances robustness against noisy labels.<n>We also introduce PromptOT, a prompt-based optimal transport data purification method to enhance the robustness further.
arXiv Detail & Related papers (2024-12-02T08:25:09Z) - Vision-Language Models are Strong Noisy Label Detectors [76.07846780815794]
This paper presents a Denoising Fine-Tuning framework, called DeFT, for adapting vision-language models.
DeFT utilizes the robust alignment of textual and visual features pre-trained on millions of auxiliary image-text pairs to sieve out noisy labels.
Experimental results on seven synthetic and real-world noisy datasets validate the effectiveness of DeFT in both noisy label detection and image classification.
arXiv Detail & Related papers (2024-09-29T12:55:17Z) - InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization [27.508861002013358]
InitNO is a paradigm that refines the initial noise in semantically-faithful images.
A strategically crafted noise optimization pipeline is developed to guide the initial noise towards valid regions.
Our method, validated through rigorous experimentation, shows a commendable proficiency in generating images in strict accordance with text prompts.
arXiv Detail & Related papers (2024-04-06T14:56:59Z) - Noise-BERT: A Unified Perturbation-Robust Framework with Noise Alignment
Pre-training for Noisy Slot Filling Task [14.707646721729228]
In a realistic dialogue system, the input information from users is often subject to various types of input perturbations.
We propose Noise-BERT, a unified Perturbation-Robust Framework with Noise Alignment Pre-training.
Our framework incorporates two Noise Alignment Pre-training tasks: Slot Masked Prediction and Sentence Noisiness Discrimination.
arXiv Detail & Related papers (2024-02-22T12:39:50Z) - High-Fidelity Speech Synthesis with Minimal Supervision: All Using
Diffusion Models [56.00939852727501]
Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations.
Non-autoregressive framework enhances controllability, and duration diffusion model enables diversified prosodic expression.
arXiv Detail & Related papers (2023-09-27T09:27:03Z) - Advancing Unsupervised Low-light Image Enhancement: Noise Estimation, Illumination Interpolation, and Self-Regulation [55.07472635587852]
Low-Light Image Enhancement (LLIE) techniques have made notable advancements in preserving image details and enhancing contrast.
These approaches encounter persistent challenges in efficiently mitigating dynamic noise and accommodating diverse low-light scenarios.
We first propose a method for estimating the noise level in low light images in a quick and accurate way.
We then devise a Learnable Illumination Interpolator (LII) to satisfy general constraints between illumination and input.
arXiv Detail & Related papers (2023-05-17T13:56:48Z) - PTP: Boosting Stability and Performance of Prompt Tuning with
Perturbation-Based Regularizer [94.23904400441957]
We introduce perturbation-based regularizers, which can smooth the loss landscape, into prompt tuning.
We design two kinds of perturbation-based regularizers, including random-noise-based and adversarial-based.
Our new algorithms improve the state-of-the-art prompt tuning methods by 1.94% and 2.34% on SuperGLUE and FewGLUE benchmarks, respectively.
arXiv Detail & Related papers (2023-05-03T20:30:51Z) - NLIP: Noise-robust Language-Image Pre-training [95.13287735264937]
We propose a principled Noise-robust Language-Image Pre-training framework (NLIP) to stabilize pre-training via two schemes: noise-harmonization and noise-completion.
Our NLIP can alleviate the common noise effects during image-text pre-training in a more efficient way.
arXiv Detail & Related papers (2022-12-14T08:19:30Z) - Robust Semantic Communications with Masked VQ-VAE Enabled Codebook [56.63571713657059]
We propose a framework for the robust end-to-end semantic communication systems to combat the semantic noise.
To combat the semantic noise, the adversarial training with weight is developed to incorporate the samples with semantic noise in the training dataset.
We develop a feature importance module (FIM) to suppress the noise-related and task-unrelated features.
arXiv Detail & Related papers (2022-06-08T16:58:47Z) - Improving Noise Robustness of Contrastive Speech Representation Learning
with Speech Reconstruction [109.44933866397123]
Noise robustness is essential for deploying automatic speech recognition systems in real-world environments.
We employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition.
We achieve comparable performance to the best supervised approach reported with only 16% of labeled data.
arXiv Detail & Related papers (2021-10-28T20:39:02Z) - Open-set Label Noise Can Improve Robustness Against Inherent Label Noise [27.885927200376386]
We show that open-set noisy labels can be non-toxic and even benefit the robustness against inherent noisy labels.
We propose a simple yet effective regularization by introducing Open-set samples with Dynamic Noisy Labels (ODNL) into training.
arXiv Detail & Related papers (2021-06-21T07:15:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.