Related papers: Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language Attack

Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language Attack

URL: http://arxiv.org/abs/2411.02669v1
Date: Mon, 04 Nov 2024 23:07:51 GMT
Title: Semantic-Aligned Adversarial Evolution Triangle for High-Transferability Vision-Language Attack
Authors: Xiaojun Jia, Sensen Gao, Qing Guo, Ke Ma, Yihao Huang, Simeng Qin, Yang Liu, Ivor Tsang Fellow, Xiaochun Cao,
Abstract summary: Vision-language pre-training models are vulnerable to multimodal adversarial examples (AEs) Previous approaches augment image-text pairs to enhance diversity within the adversarial example generation process. We propose sampling from adversarial evolution triangles composed of clean, historical, and current adversarial examples to enhance adversarial diversity.
Score: 51.16384207202798
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Vision-language pre-training (VLP) models excel at interpreting both images and text but remain vulnerable to multimodal adversarial examples (AEs). Advancing the generation of transferable AEs, which succeed across unseen models, is key to developing more robust and practical VLP models. Previous approaches augment image-text pairs to enhance diversity within the adversarial example generation process, aiming to improve transferability by expanding the contrast space of image-text features. However, these methods focus solely on diversity around the current AEs, yielding limited gains in transferability. To address this issue, we propose to increase the diversity of AEs by leveraging the intersection regions along the adversarial trajectory during optimization. Specifically, we propose sampling from adversarial evolution triangles composed of clean, historical, and current adversarial examples to enhance adversarial diversity. We provide a theoretical analysis to demonstrate the effectiveness of the proposed adversarial evolution triangle. Moreover, we find that redundant inactive dimensions can dominate similarity calculations, distorting feature matching and making AEs model-dependent with reduced transferability. Hence, we propose to generate AEs in the semantic image-text feature contrast space, which can project the original feature space into a semantic corpus subspace. The proposed semantic-aligned subspace can reduce the image feature redundancy, thereby improving adversarial transferability. Extensive experiments across different datasets and models demonstrate that the proposed method can effectively improve adversarial transferability and outperform state-of-the-art adversarial attack methods. The code is released at https://github.com/jiaxiaojunQAQ/SA-AET.

Related papers

Towards Highly Transferable Vision-Language Attack via Semantic-Augmented Dynamic Contrastive Interaction [67.45032003041399]
We propose a Semantic-Augmented Dynamic Contrastive Attack (SADCA) that enhances adversarial transferability through progressive and semantically guided perturbations.<n>SADCA establishes a contrastive learning mechanism involving adversarial, positive and negative samples, to reinforce the semantic inconsistency of the obtained perturbations.<n>Experiments on multiple datasets and models demonstrate that SADCA significantly improves adversarial transferability and consistently surpasses state-of-the-art methods.
arXiv Detail & Related papers (2026-03-05T05:46:16Z)
Boosting Adversarial Transferability for Hyperspectral Image Classification Using 3D Structure-invariant Transformation and Intermediate Feature Distance [12.577452125758368]
Hyperspectral image (HSI) classification technologies based on Deep Neural Networks (DNNs) are vulnerable to adversarial attacks.<n>This paper proposes a novel method to enhance the transferability of the adversarial examples for HSI classification models.<n>The proposed method maintains robust attack performance even under defense strategies.
arXiv Detail & Related papers (2025-06-12T08:08:52Z)
Evolution-based Region Adversarial Prompt Learning for Robustness Enhancement in Vision-Language Models [52.8949080772873]
We propose an evolution-based region adversarial prompt tuning method called ER-APT. In each training iteration, we first generate AEs using traditional gradient-based methods. Subsequently, a genetic evolution mechanism incorporating selection, mutation, and crossover is applied to optimize the AEs. The final evolved AEs are used for prompt tuning, achieving region-based adversarial optimization instead of conventional single-point adversarial prompt tuning.
arXiv Detail & Related papers (2025-03-17T07:08:47Z)
Boosting Adversarial Transferability with Spatial Adversarial Alignment [30.343721474168635]
Deep neural networks are vulnerable to adversarial examples that exhibit transferability across various models. We propose a technique that employs an alignment loss and leverages a witness model to fine-tune the surrogate model. Experiments on various architectures on ImageNet show that aligned surrogate models based on SAA can provide higher transferable adversarial examples.
arXiv Detail & Related papers (2025-01-02T02:35:47Z)
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory [8.591762884862504]
Vision-language pre-training models are susceptible to multimodal adversarial examples (AEs) We propose using diversification along the intersection region of adversarial trajectory to expand the diversity of AEs. To further mitigate the potential overfitting, we direct the adversarial text deviating from the last intersection region along the optimization path.
arXiv Detail & Related papers (2024-03-19T05:10:10Z)
SA-Attack: Improving Adversarial Transferability of Vision-Language Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios. We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z)
OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport Optimization [65.57380193070574]
Vision-language pre-training models are vulnerable to multi-modal adversarial examples. Recent works have indicated that leveraging data augmentation and image-text modal interactions can enhance the transferability of adversarial examples. We propose an Optimal Transport-based Adversarial Attack, dubbed OT-Attack.
arXiv Detail & Related papers (2023-12-07T16:16:50Z)
TranSegPGD: Improving Transferability of Adversarial Examples on Semantic Segmentation [62.954089681629206]
We propose an effective two-stage adversarial attack strategy to improve the transferability of adversarial examples on semantic segmentation. The proposed adversarial attack method can achieve state-of-the-art performance.
arXiv Detail & Related papers (2023-12-03T00:48:33Z)
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models [52.530286579915284]
We present the first study to investigate the adversarial transferability of vision-language pre-training models. The transferability degradation is partly caused by the under-utilization of cross-modal interactions. We propose a highly transferable Set-level Guidance Attack (SGA) that thoroughly leverages modality interactions and incorporates alignment-preserving augmentation with cross-modal guidance.
arXiv Detail & Related papers (2023-07-26T09:19:21Z)
Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text. These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining. We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.