Towards Robust and Accurate Visual Prompting
- URL: http://arxiv.org/abs/2311.10992v1
- Date: Sat, 18 Nov 2023 07:00:56 GMT
- Title: Towards Robust and Accurate Visual Prompting
- Authors: Qi Li, Liangzhi Li, Zhouqiang Jiang, Bowen Wang
- Abstract summary: We study whether a visual prompt derived from a robust model can inherit the robustness while suffering from the generalization performance decline.
We introduce a novel technique named Prompt Boundary Loose (PBL) to effectively mitigates the suboptimal results of visual prompt on standard accuracy.
Our findings are universal and demonstrate the significant benefits of our proposed method.
- Score: 11.918195429308035
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual prompting, an efficient method for transfer learning, has shown its
potential in vision tasks. However, previous works focus exclusively on VP from
standard source models, it is still unknown how it performs under the scenario
of a robust source model: Whether a visual prompt derived from a robust model
can inherit the robustness while suffering from the generalization performance
decline, albeit for a downstream dataset that is different from the source
dataset? In this work, we get an affirmative answer of the above question and
give an explanation on the visual representation level. Moreover, we introduce
a novel technique named Prompt Boundary Loose (PBL) to effectively mitigates
the suboptimal results of visual prompt on standard accuracy without losing (or
even significantly improving) its adversarial robustness when using a robust
model as source model. Extensive experiments across various datasets show that
our findings are universal and demonstrate the significant benefits of our
proposed method.
Related papers
- Idempotent Unsupervised Representation Learning for Skeleton-Based Action Recognition [13.593511876719367]
We propose a novel skeleton-based idempotent generative model (IGM) for unsupervised representation learning.
Our experiments on benchmark datasets, NTU RGB+D and PKUMMD, demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2024-10-27T06:29:04Z) - Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection [16.21235742118949]
We propose a novel approach that repurposes a well-trained Vision-Language Models (VLMs) for general deepfake detection.
Motivated by the model reprogramming paradigm that manipulates the model prediction via data perturbations, our method can reprogram a pretrained VLM model.
Our superior performances are at less cost of trainable parameters, making it a promising approach for real-world applications.
arXiv Detail & Related papers (2024-09-04T12:46:30Z) - Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition [6.996304653818122]
We propose a simple yet powerful approach to better exploit the potential of a foundation model for Visual Place Recognition.
We first demonstrate that features extracted from self-attention layers can serve as a powerful re-ranker for VPR.
We then demonstrate that a single-stage method leveraging internal ViT layers for pooling can generate global features that achieve state-of-the-art results.
arXiv Detail & Related papers (2024-05-28T11:24:41Z) - DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - Debiasing Multimodal Large Language Models [61.6896704217147]
Large Vision-Language Models (LVLMs) have become indispensable tools in computer vision and natural language processing.
Our investigation reveals a noteworthy bias in the generated content, where the output is primarily influenced by the underlying Large Language Models (LLMs) prior to the input image.
To rectify these biases and redirect the model's focus toward vision information, we introduce two simple, training-free strategies.
arXiv Detail & Related papers (2024-03-08T12:35:07Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust.
Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model.
We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z) - Preserving Knowledge Invariance: Rethinking Robustness Evaluation of
Open Information Extraction [50.62245481416744]
We present the first benchmark that simulates the evaluation of open information extraction models in the real world.
We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique.
By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques.
arXiv Detail & Related papers (2023-05-23T12:05:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.