Exploring Visual Prompting: Robustness Inheritance and Beyond
- URL: http://arxiv.org/abs/2506.06823v1
- Date: Sat, 07 Jun 2025 14:56:32 GMT
- Title: Exploring Visual Prompting: Robustness Inheritance and Beyond
- Authors: Qi Li, Liangzhi Li, Zhouqiang Jiang, Bowen Wang, Keke Tang,
- Abstract summary: We propose a strategy called Prompt Boundary Loosening (PBL) to mitigate the trade-off faced by Visual Prompting (VP)<n>As a lightweight, plug-and-play strategy naturally compatible with VP, PBL effectively ensures the successful inheritance of robustness when the source model is a robust model.<n>Our findings are universal and demonstrate the significant benefits of the proposed strategy.
- Score: 10.911786739957599
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual Prompting (VP), an efficient method for transfer learning, has shown its potential in vision tasks. However, previous works focus exclusively on VP from standard source models, it is still unknown how it performs under the scenario of a robust source model: Can the robustness of the source model be successfully inherited? Does VP also encounter the same trade-off between robustness and generalization ability as the source model during this process? If such a trade-off exists, is there a strategy specifically tailored to VP to mitigate this limitation? In this paper, we thoroughly explore these three questions for the first time and provide affirmative answers to them. To mitigate the trade-off faced by VP, we propose a strategy called Prompt Boundary Loosening (PBL). As a lightweight, plug-and-play strategy naturally compatible with VP, PBL effectively ensures the successful inheritance of robustness when the source model is a robust model, while significantly enhancing VP's generalization ability across various downstream datasets. Extensive experiments across various datasets show that our findings are universal and demonstrate the significant benefits of the proposed strategy.
Related papers
- Leveraging LLM Inconsistency to Boost Pass@k Performance [3.797421474324735]
Large language models (LLMs) achieve impressive abilities in numerous domains, but exhibit inconsistent performance in response to minor input changes.<n>We introduce a novel method for leveraging models' inconsistency to boost Pass@k performance.<n>Specifically, we present a "Variator" agent that generates k variants of a given task and submits one candidate solution for each one.
arXiv Detail & Related papers (2025-05-19T10:22:04Z) - R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning [97.49610356913874]
We propose a robust test-time prompt tuning (R-TPT) for vision-language models (VLMs)<n>R-TPT mitigates the impact of adversarial attacks during the inference stage.<n>We introduce a plug-and-play reliability-based weighted ensembling strategy to strengthen the defense.
arXiv Detail & Related papers (2025-04-15T13:49:31Z) - A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities [51.08999772842298]
Tabular Prior-data Fitted Network v2 (TabPFN v2) achieves unprecedented in-context learning performance across diverse downstream datasets.<n>We show that TabPFN v2 can infer attribute relationships even when provided with randomized attribute token inputs.<n>We demonstrate that TabPFN v2's limitations can be addressed through a test-time divide-and-context strategy.
arXiv Detail & Related papers (2025-02-24T17:38:42Z) - ACTRESS: Active Retraining for Semi-supervised Visual Grounding [52.08834188447851]
A previous study, RefTeacher, makes the first attempt to tackle this task by adopting the teacher-student framework to provide pseudo confidence supervision and attention-based supervision.
This approach is incompatible with current state-of-the-art visual grounding models, which follow the Transformer-based pipeline.
Our paper proposes the ACTive REtraining approach for Semi-Supervised Visual Grounding, abbreviated as ACTRESS.
arXiv Detail & Related papers (2024-07-03T16:33:31Z) - One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models [46.64419395105025]
We present a Contrastive-training Perturbation Generator with Cross-modal conditions (C-PGC) to achieve the attack.<n>C-PGC incorporates both unimodal and cross-modal information as effective guidance.<n>Experiments show that C-PGC successfully forces adversarial samples to move away from their original area.
arXiv Detail & Related papers (2024-06-08T15:01:54Z) - EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition [6.996304653818122]
We present an effective approach to harness the potential of a foundation model for Visual Place Recognition.<n>We show that features extracted from self-attention layers can act as a powerful re-ranker for VPR, even in a zero-shot setting.<n>Our method also demonstrates exceptional robustness and generalization, setting new state-of-the-art performance.
arXiv Detail & Related papers (2024-05-28T11:24:41Z) - SA-Attack: Improving Adversarial Transferability of Vision-Language
Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios.
We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z) - Combining inherent knowledge of vision-language models with unsupervised domain adaptation through strong-weak guidance [44.1830188215271]
Unsupervised domain adaptation (UDA) tries to overcome the tedious work of labeling data by leveraging a labeled source dataset.<n>Current vision-language models exhibit remarkable zero-shot prediction capabilities.<n>We introduce a strong-weak guidance learning scheme that employs zero-shot predictions to help align the source and target dataset.
arXiv Detail & Related papers (2023-12-07T06:16:39Z) - Towards Robust and Accurate Visual Prompting [11.918195429308035]
We study whether a visual prompt derived from a robust model can inherit the robustness while suffering from the generalization performance decline.
We introduce a novel technique named Prompt Boundary Loose (PBL) to effectively mitigates the suboptimal results of visual prompt on standard accuracy.
Our findings are universal and demonstrate the significant benefits of our proposed method.
arXiv Detail & Related papers (2023-11-18T07:00:56Z) - Uncovering the Hidden Cost of Model Compression [43.62624133952414]
Visual Prompting has emerged as a pivotal method for transfer learning in computer vision.
Model compression detrimentally impacts the performance of visual prompting-based transfer.
However, negative effects on calibration are not present when models are compressed via quantization.
arXiv Detail & Related papers (2023-08-29T01:47:49Z) - Visual Prompting for Adversarial Robustness [63.89295305670113]
We use visual prompting computation to improve adversarial robustness of a fixed, pre-trained model at testing time.
We propose a new VP method, termed Class-wise Adrial Visual Prompting (C-AVP), to generate class-wise visual prompts.
C-AVP outperforms the conventional VP method, with 2.1X standard accuracy gain and 2X robust accuracy gain.
arXiv Detail & Related papers (2022-10-12T15:06:07Z) - Principled Knowledge Extrapolation with GANs [92.62635018136476]
We study counterfactual synthesis from a new perspective of knowledge extrapolation.
We show that an adversarial game with a closed-form discriminator can be used to address the knowledge extrapolation problem.
Our method enjoys both elegant theoretical guarantees and superior performance in many scenarios.
arXiv Detail & Related papers (2022-05-21T08:39:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.