Related papers: MTAttack: Multi-Target Backdoor Attacks against Large Vision-Language Models

MTAttack: Multi-Target Backdoor Attacks against Large Vision-Language Models

URL: http://arxiv.org/abs/2511.10098v1
Date: Fri, 14 Nov 2025 01:32:27 GMT
Title: MTAttack: Multi-Target Backdoor Attacks against Large Vision-Language Models
Authors: Zihan Wang, Guansong Pang, Wenjun Miao, Jin Zheng, Xiao Bai,
Abstract summary: We propose MTAttack, the first multi-target backdoor attack framework for enforcing accurate multiple trigger-target mappings in LVLMs.<n> Experiments on popular benchmarks demonstrate a high success rate of MTAttack for multi-target attacks.<n>Our attack exhibits strong generalizability across datasets and robustness against backdoor defense strategies.
Score: 52.37749859972453
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in Large Visual Language Models (LVLMs) have demonstrated impressive performance across various vision-language tasks by leveraging large-scale image-text pretraining and instruction tuning. However, the security vulnerabilities of LVLMs have become increasingly concerning, particularly their susceptibility to backdoor attacks. Existing backdoor attacks focus on single-target attacks, i.e., targeting a single malicious output associated with a specific trigger. In this work, we uncover multi-target backdoor attacks, where multiple independent triggers corresponding to different attack targets are added in a single pass of training, posing a greater threat to LVLMs in real-world applications. Executing such attacks in LVLMs is challenging since there can be many incorrect trigger-target mappings due to severe feature interference among different triggers. To address this challenge, we propose MTAttack, the first multi-target backdoor attack framework for enforcing accurate multiple trigger-target mappings in LVLMs. The core of MTAttack is a novel optimization method with two constraints, namely Proxy Space Partitioning constraint and Trigger Prototype Anchoring constraint. It jointly optimizes multiple triggers in the latent space, with each trigger independently mapping clean images to a unique proxy class while at the same time guaranteeing their separability. Experiments on popular benchmarks demonstrate a high success rate of MTAttack for multi-target attacks, substantially outperforming existing attack methods. Furthermore, our attack exhibits strong generalizability across datasets and robustness against backdoor defense strategies. These findings highlight the vulnerability of LVLMs to multi-target backdoor attacks and underscore the urgent need for mitigating such threats. Code is available at https://github.com/mala-lab/MTAttack.

Related papers

Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning [89.1856483797116]
We introduce BEAT, the first framework to inject visual backdoors into MLLM-based embodied agents.<n>Unlike textual triggers, object triggers exhibit wide variation across viewpoints and lighting, making them difficult to implant reliably.<n>BEAT achieves attack success rates up to 80%, while maintaining strong benign task performance.
arXiv Detail & Related papers (2025-10-31T16:50:49Z)
Active Attacks: Red-teaming LLMs via Adaptive Environments [71.55110023234376]
We address the challenge of generating diverse attack prompts for large language models (LLMs)<n>We introduce textitActive Attacks, a novel RL-based red-teaming algorithm that adapts its attacks as the victim evolves.
arXiv Detail & Related papers (2025-09-26T06:27:00Z)
FLAT: Latent-Driven Arbitrary-Target Backdoor Attacks in Federated Learning [7.655329509535266]
Federated learning (FL) is vulnerable to backdoor attacks.<n>Most existing methods are limited by fixed-pattern or single-target triggers.<n>We propose FLAT (FL Arbitrary-Target Attack), a novel backdoor attack that leverages a latent-driven conditional autoencoder.
arXiv Detail & Related papers (2025-08-06T03:54:29Z)
LADDER: Multi-objective Backdoor Attack via Evolutionary Algorithm [11.95174457001938]
This work proposes a multiobjective blackbox backdoor attack in dual domains via evolutionary algorithm (LADDER)<n>In particular, we formulate LADDER as a multi-objective optimization problem (MOP) and solve it via multi-objective evolutionary algorithm (MOEA)<n>Experiments comprehensively showcase that LADDER attack effectiveness of at least 99%, attack robustness with 90.23%, superior natural stealthiness (1.12x to 196.74x improvement) and excellent spectral stealthiness (8.45x enhancement) as compared to current stealthy attacks by the average $l$-norm across 5 public datasets.
arXiv Detail & Related papers (2024-11-28T11:50:23Z)
Securing Multi-turn Conversational Language Models From Distributed Backdoor Triggers [29.554818890832887]
Large language models (LLMs) have acquired the ability to handle longer context lengths and understand nuances in text. This paper exposes a vulnerability that leverages the multi-turn feature and strong learning ability of LLMs to harm the end-user. We propose a decoding time defense that scales linearly with assistant response sequence length and reduces the backdoor to as low as 0.35%.
arXiv Detail & Related papers (2024-07-04T20:57:06Z)
Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift [104.76588209308666]
This paper explores backdoor attacks in LVLM instruction tuning across mismatched training and testing domains.<n>We introduce a new evaluation dimension, backdoor domain generalization, to assess attack robustness.<n>We propose a multimodal attribution backdoor attack (MABA) that injects domain-agnostic triggers into critical areas.
arXiv Detail & Related papers (2024-06-27T02:31:03Z)
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models [65.23688155159398]
Autoregressive Visual Language Models (VLMs) showcase impressive few-shot learning capabilities in a multimodal context. Recently, multimodal instruction tuning has been proposed to further enhance instruction-following abilities. Adversaries can implant a backdoor by injecting poisoned samples with triggers embedded in instructions or images. We propose a multimodal instruction backdoor attack, namely VL-Trojan.
arXiv Detail & Related papers (2024-02-21T14:54:30Z)
Shortcuts Everywhere and Nowhere: Exploring Multi-Trigger Backdoor Attacks [63.89012304595422]
Backdoor attacks have become a significant threat to the pre-training and deployment of deep neural networks (DNNs)<n>In this study, we explore the concept of Multi-Trigger Backdoor Attacks (MTBAs), where multiple adversaries leverage different types of triggers to poison the same dataset.
arXiv Detail & Related papers (2024-01-27T04:49:37Z)
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses. We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z)
M-to-N Backdoor Paradigm: A Multi-Trigger and Multi-Target Attack to Deep Learning Models [17.699749361475774]
We propose a new $M$-to-$N$ attack paradigm that allows an attacker to manipulate any input to attack $N$ target classes. Our attack selects $M$ clean images from each target class as triggers and leverages our proposed poisoned image generation framework. Our new backdoor attack is highly effective in attacking multiple target classes and robust against pre-processing operations and existing defenses.
arXiv Detail & Related papers (2022-11-03T15:06:50Z)
Marksman Backdoor: Backdoor Attacks with Arbitrary Target Class [17.391987602738606]
In recent years, machine learning models have been shown to be vulnerable to backdoor attacks. This paper exploits a novel backdoor attack with a much more powerful payload, denoted as Marksman. We show empirically that the proposed framework achieves high attack performance while preserving the clean-data performance in several benchmark datasets.
arXiv Detail & Related papers (2022-10-17T15:46:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.