Related papers: Typography Leads Semantic Diversifying: Amplifying Adversarial Transferability across Multimodal Large Language Models

Typography Leads Semantic Diversifying: Amplifying Adversarial Transferability across Multimodal Large Language Models

URL: http://arxiv.org/abs/2405.20090v1
Date: Thu, 30 May 2024 14:27:20 GMT
Title: Typography Leads Semantic Diversifying: Amplifying Adversarial Transferability across Multimodal Large Language Models
Authors: Hao Cheng, Erjia Xiao, Jiahang Cao, Le Yang, Kaidi Xu, Jindong Gu, Renjing Xu,
Abstract summary: Adversarial examples with human-imperceptible perturbations possess a characteristic known as transferability. In this paper, we propose the Typographic-based Semantic Transfer Attack (TSTA) In the scenarios of Harmful Word Insertion and Important Information Protection, our TSTA demonstrates superior performance.
Score: 24.275446796100653
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Following the advent of the Artificial Intelligence (AI) era of large models, Multimodal Large Language Models (MLLMs) with the ability to understand cross-modal interactions between vision and text have attracted wide attention. Adversarial examples with human-imperceptible perturbation are shown to possess a characteristic known as transferability, which means that a perturbation generated by one model could also mislead another different model. Augmenting the diversity in input data is one of the most significant methods for enhancing adversarial transferability. This method has been certified as a way to significantly enlarge the threat impact under black-box conditions. Research works also demonstrate that MLLMs can be exploited to generate adversarial examples in the white-box scenario. However, the adversarial transferability of such perturbations is quite limited, failing to achieve effective black-box attacks across different models. In this paper, we propose the Typographic-based Semantic Transfer Attack (TSTA), which is inspired by: (1) MLLMs tend to process semantic-level information; (2) Typographic Attack could effectively distract the visual information captured by MLLMs. In the scenarios of Harmful Word Insertion and Important Information Protection, our TSTA demonstrates superior performance.

Related papers

Enhancing Cross-task Transfer of Large Language Models via Activation Steering [75.41750053623298]
Cross-task in-context learning offers a direct solution for transferring knowledge across tasks.<n>We investigate whether cross-task transfer can be achieved via latent space steering without parameter updates or input expansion.<n>We propose a novel Cross-task Activation Steering Transfer framework that enables effective transfer by manipulating the model's internal activation states.
arXiv Detail & Related papers (2025-07-17T15:47:22Z)
MLLMs are Deeply Affected by Modality Bias [158.64371871084478]
Recent advances in Multimodal Large Language Models (MLLMs) have shown promising results in integrating diverse modalities such as texts and images.<n>MLLMs are heavily influenced by modality bias, often relying on language while under-utilizing other modalities like visual inputs.<n>This paper argues that MLLMs are deeply affected by modality bias, highlighting its manifestations across various tasks.
arXiv Detail & Related papers (2025-05-24T11:49:31Z)
X-Transfer Attacks: Towards Super Transferable Adversarial Attacks on CLIP [32.85582585781569]
We introduce textbfX-Transfer, a novel attack method that exposes a universal adversarial vulnerability in CLIP.<n>X-Transfer generates a Universal Adversarial Perturbation capable of deceiving various CLIP encoders and downstream VLMs across different samples, tasks, and domains.
arXiv Detail & Related papers (2025-05-08T11:59:13Z)
MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks [85.3303135160762]
MIRAGE is a novel framework that exploits narrative-driven context and role immersion to circumvent safety mechanisms in Multimodal Large Language Models. It achieves state-of-the-art performance, improving attack success rates by up to 17.5% over the best baselines. We demonstrate that role immersion and structured semantic reconstruction can activate inherent model biases, facilitating the model's spontaneous violation of ethical safeguards.
arXiv Detail & Related papers (2025-03-24T20:38:42Z)
Survey of Adversarial Robustness in Multimodal Large Language Models [17.926240920647892]
Multimodal Large Language Models (MLLMs) have demonstrated exceptional performance in artificial intelligence. Their deployment in real-world applications raises significant concerns about adversarial vulnerabilities. This paper reviews the adversarial robustness of MLLMs, covering different modalities.
arXiv Detail & Related papers (2025-03-18T06:54:59Z)
Improving Adversarial Transferability in MLLMs via Dynamic Vision-Language Alignment Attack [16.70399451598529]
We introduce the Dynamic Vision-Language Alignment (DynVLA) Attack, a novel approach that injects dynamic perturbations into the vision-language connector to enhance generalization across diverse vision-language alignment of different models. Our experimental results show that DynVLA significantly improves the transferability of adversarial examples across various MLLMs, including BLIP2, InstructBLIP, MiniGPT4, LLaVA, and closed-source models such as Gemini.
arXiv Detail & Related papers (2025-02-27T01:33:19Z)
On Adversarial Robustness of Language Models in Transfer Learning [13.363850350446869]
We show that transfer learning, while improving standard performance metrics, often leads to increased vulnerability to adversarial attacks. Our findings demonstrate that larger models exhibit greater resilience to this phenomenon, suggesting a complex interplay between model size, architecture, and adaptation methods.
arXiv Detail & Related papers (2024-12-29T15:55:35Z)
Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks. Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval. This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z)
A Persuasion-Based Prompt Learning Approach to Improve Smishing Detection through Data Augmentation [1.4388765025696655]
A number of challenges remain in machine learning-based smishing detection. Given the sensitive nature of smishing-related data, there is a lack of publicly accessible data that can be used for training and evaluating ML models. We introduce a novel data augmentation method utilizing a few-shot prompt learning approach.
arXiv Detail & Related papers (2024-10-18T04:20:02Z)
Probing the Robustness of Vision-Language Pretrained Models: A Multimodal Adversarial Attack Approach [30.9778838504609]
Vision-language pretraining with transformers has demonstrated exceptional performance across numerous multimodal tasks. Existing multimodal attack methods have largely overlooked cross-modal interactions between visual and textual modalities. We propose a novel Joint Multimodal Transformer Feature Attack (JMTFA) that concurrently introduces adversarial perturbations in both visual and textual modalities.
arXiv Detail & Related papers (2024-08-24T04:31:37Z)
Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models [17.663550432103534]
Multimodal Large Language Models (MLLMs) extend the capacity of LLMs to understand multimodal information comprehensively. These models are susceptible to jailbreak attacks, where malicious users can break the safety alignment of the target model and generate misleading and harmful answers. We propose Cross-modality Information DEtectoR (CIDER), a plug-and-play jailbreaking detector designed to identify maliciously perturbed image inputs.
arXiv Detail & Related papers (2024-07-31T15:02:46Z)
MARS: Benchmarking the Metaphysical Reasoning Abilities of Language Models with a Multi-task Evaluation Dataset [50.36095192314595]
Large Language Models (LLMs) function as conscious agents with generalizable reasoning capabilities. This ability remains underexplored due to the complexity of modeling infinite possible changes in an event. We introduce the first-ever benchmark, MARS, comprising three tasks corresponding to each step.
arXiv Detail & Related papers (2024-06-04T08:35:04Z)
The Wolf Within: Covert Injection of Malice into MLLM Societies via an MLLM Operative [55.08395463562242]
Multimodal Large Language Models (MLLMs) are constantly defining the new boundary of Artificial General Intelligence (AGI) Our paper explores a novel vulnerability in MLLM societies - the indirect propagation of malicious content.
arXiv Detail & Related papers (2024-02-20T23:08:21Z)
SA-Attack: Improving Adversarial Transferability of Vision-Language Pre-training Models via Self-Augmentation [56.622250514119294]
In contrast to white-box adversarial attacks, transfer attacks are more reflective of real-world scenarios. We propose a self-augment-based transfer attack method, termed SA-Attack.
arXiv Detail & Related papers (2023-12-08T09:08:50Z)
Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models [56.256069117502385]
Chain of Thought (CoT) approaches can be used to enhance the capability of Large Language Models (LLMs) on complex reasoning tasks. However, the selection of optimal CoT demonstration examples in multi-modal reasoning remains less explored. We introduce a novel approach that addresses this challenge by using retrieval mechanisms to automatically select demonstration examples.
arXiv Detail & Related papers (2023-12-04T08:07:21Z)
Set-level Guidance Attack: Boosting Adversarial Transferability of Vision-Language Pre-training Models [52.530286579915284]
We present the first study to investigate the adversarial transferability of vision-language pre-training models. The transferability degradation is partly caused by the under-utilization of cross-modal interactions. We propose a highly transferable Set-level Guidance Attack (SGA) that thoroughly leverages modality interactions and incorporates alignment-preserving augmentation with cross-modal guidance.
arXiv Detail & Related papers (2023-07-26T09:19:21Z)
Why Does Little Robustness Help? A Further Step Towards Understanding Adversarial Transferability [23.369773251447636]
Adversarial examples (AEs) for DNNs have been shown to be transferable.<n>In this paper, we take a further step towards understanding adversarial transferability.
arXiv Detail & Related papers (2023-07-15T19:20:49Z)
Exploring Transferable and Robust Adversarial Perturbation Generation from the Perspective of Network Hierarchy [52.153866313879924]
The transferability and robustness of adversarial examples are two practical yet important properties for black-box adversarial attacks. We propose a transferable and robust adversarial generation (TRAP) method. Our TRAP achieves impressive transferability and high robustness against certain interferences.
arXiv Detail & Related papers (2021-08-16T11:52:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.