Boosting Cross-task Transferability of Adversarial Patches with Visual
Relations
- URL: http://arxiv.org/abs/2304.05402v1
- Date: Tue, 11 Apr 2023 11:43:57 GMT
- Title: Boosting Cross-task Transferability of Adversarial Patches with Visual
Relations
- Authors: Tony Ma, Songze Li, Yisong Xiao, Shunchang Liu
- Abstract summary: We propose a novel Visual Relation-based cross-task Adversarial Patch generation method called VRAP.
VRAP employs scene graphs to combine object recognition-based deception with predicate-based relations elimination.
Our experiments demonstrate that VRAP significantly surpasses previous methods in terms of black-box transferability across diverse visual reasoning tasks.
- Score: 4.694536172504848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The transferability of adversarial examples is a crucial aspect of evaluating
the robustness of deep learning systems, particularly in black-box scenarios.
Although several methods have been proposed to enhance cross-model
transferability, little attention has been paid to the transferability of
adversarial examples across different tasks. This issue has become increasingly
relevant with the emergence of foundational multi-task AI systems such as
Visual ChatGPT, rendering the utility of adversarial samples generated by a
single task relatively limited. Furthermore, these systems often entail
inferential functions beyond mere recognition-like tasks. To address this gap,
we propose a novel Visual Relation-based cross-task Adversarial Patch
generation method called VRAP, which aims to evaluate the robustness of various
visual tasks, especially those involving visual reasoning, such as Visual
Question Answering and Image Captioning. VRAP employs scene graphs to combine
object recognition-based deception with predicate-based relations elimination,
thereby disrupting the visual reasoning information shared among inferential
tasks. Our extensive experiments demonstrate that VRAP significantly surpasses
previous methods in terms of black-box transferability across diverse visual
reasoning tasks.
Related papers
- Cross-Task Attack: A Self-Supervision Generative Framework Based on Attention Shift [3.6015992701968793]
We propose a self-supervised Cross-Task Attack framework (CTA)
CTA generates cross-task perturbations by shifting the attention area of samples away from the co-attention map and closer to the anti-attention map.
We conduct extensive experiments on multiple vision tasks and the experimental results confirm the effectiveness of the proposed design for adversarial attacks.
arXiv Detail & Related papers (2024-07-18T17:01:10Z) - RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception [64.80760846124858]
This paper proposes a novel unified representation, RepVF, which harmonizes the representation of various perception tasks.
RepVF characterizes the structure of different targets in the scene through a vector field, enabling a single-head, multi-task learning model.
Building upon RepVF, we introduce RFTR, a network designed to exploit the inherent connections between different tasks.
arXiv Detail & Related papers (2024-07-15T16:25:07Z) - An Image Is Worth 1000 Lies: Adversarial Transferability across Prompts on Vision-Language Models [19.3364863157474]
A well-known concern about traditional task-specific vision models is that they can be misled by imperceptible adversarial perturbations.
In this work, we propose the Cross-Prompt Attack (CroPA)
CroPA updates the visual adversarial perturbation with learnable prompts, which are designed to counteract the misleading effects of the adversarial image.
arXiv Detail & Related papers (2024-03-14T17:59:35Z) - Visual In-Context Learning for Large Vision-Language Models [62.5507897575317]
In Large Visual Language Models (LVLMs) the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities.
We introduce a novel Visual In-Context Learning (VICL) method comprising Visual Demonstration Retrieval, Intent-Oriented Image Summarization, and Intent-Oriented Demonstration Composition.
Our approach retrieves images via ''Retrieval & Rerank'' paradigm, summarises images with task intent and task-specific visual parsing, and composes language-based demonstrations.
arXiv Detail & Related papers (2024-02-18T12:43:38Z) - A Survey on Transferability of Adversarial Examples across Deep Neural Networks [53.04734042366312]
adversarial examples can manipulate machine learning models into making erroneous predictions.
The transferability of adversarial examples enables black-box attacks which circumvent the need for detailed knowledge of the target model.
This survey explores the landscape of the adversarial transferability of adversarial examples.
arXiv Detail & Related papers (2023-10-26T17:45:26Z) - Task Formulation Matters When Learning Continually: A Case Study in
Visual Question Answering [58.82325933356066]
Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge.
We present a detailed study of how different settings affect performance for Visual Question Answering.
arXiv Detail & Related papers (2022-09-30T19:12:58Z) - Visual Perturbation-aware Collaborative Learning for Overcoming the
Language Prior Problem [60.0878532426877]
We propose a novel collaborative learning scheme from the viewpoint of visual perturbation calibration.
Specifically, we devise a visual controller to construct two sorts of curated images with different perturbation extents.
The experimental results on two diagnostic VQA-CP benchmark datasets evidently demonstrate its effectiveness.
arXiv Detail & Related papers (2022-07-24T23:50:52Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z) - Learning Task Informed Abstractions [10.920599910769276]
We propose learning Task Informed Abstractions (TIA) that explicitly separates reward-correlated visual features from distractors.
TIA leads to significant performance gains over state-of-the-art methods on many visual control tasks.
arXiv Detail & Related papers (2021-06-29T17:56:11Z) - A Broad Study on the Transferability of Visual Representations with
Contrastive Learning [15.667240680328922]
We study the transferability of learned representations of contrastive approaches for linear evaluation, full-network transfer, and few-shot recognition.
The results show that the contrastive approaches learn representations that are easily transferable to a different downstream task.
Our analysis reveals that the representations learned from the contrastive approaches contain more low/mid-level semantics than cross-entropy models.
arXiv Detail & Related papers (2021-03-24T22:55:04Z) - Analyzing Visual Representations in Embodied Navigation Tasks [45.35107294831313]
We use the recently proposed projection weighted Canonical Correlation Analysis (PWCCA) to measure the similarity of visual representations learned in the same environment by performing different tasks.
We then empirically demonstrate that visual representations learned on one task can be effectively transferred to a different task.
arXiv Detail & Related papers (2020-03-12T19:43:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.