Related papers: Hidden Tail: Adversarial Image Causing Stealthy Resource Consumption in Vision-Language Models

Hidden Tail: Adversarial Image Causing Stealthy Resource Consumption in Vision-Language Models

URL: http://arxiv.org/abs/2508.18805v1
Date: Tue, 26 Aug 2025 08:40:22 GMT
Title: Hidden Tail: Adversarial Image Causing Stealthy Resource Consumption in Vision-Language Models
Authors: Rui Zhang, Zihan Wang, Tianli Yang, Hongwei Li, Wenbo Jiang, Qingchuan Zhao, Yang Liu, Guowen Xu,
Abstract summary: Vision-Language Models (VLMs) are increasingly deployed in real-world applications, but their high inference cost makes them vulnerable to resource consumption attacks.<n>We propose textitHidden Tail, a stealthy resource consumption attack that crafts prompt-agnostic adversarial images.<n>Our method employs a composite loss function that balances semantic preservation, repetitive special token induction, and suppression of the end-of-sequence token.
Score: 30.671621529825654
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision-Language Models (VLMs) are increasingly deployed in real-world applications, but their high inference cost makes them vulnerable to resource consumption attacks. Prior attacks attempt to extend VLM output sequences by optimizing adversarial images, thereby increasing inference costs. However, these extended outputs often introduce irrelevant abnormal content, compromising attack stealthiness. This trade-off between effectiveness and stealthiness poses a major limitation for existing attacks. To address this challenge, we propose \textit{Hidden Tail}, a stealthy resource consumption attack that crafts prompt-agnostic adversarial images, inducing VLMs to generate maximum-length outputs by appending special tokens invisible to users. Our method employs a composite loss function that balances semantic preservation, repetitive special token induction, and suppression of the end-of-sequence (EOS) token, optimized via a dynamic weighting strategy. Extensive experiments show that \textit{Hidden Tail} outperforms existing attacks, increasing output length by up to 19.2$\times$ and reaching the maximum token limit, while preserving attack stealthiness. These results highlight the urgent need to improve the robustness of VLMs against efficiency-oriented adversarial threats. Our code is available at https://github.com/zhangrui4041/Hidden_Tail.

Related papers

An Image Is Worth Ten Thousand Words: Verbose-Text Induction Attacks on VLMs [48.05423013052023]
This paper proposes a novel verbose-text induction attack (VTIA) to inject imperceptible adversarial perturbations into benign images.<n>We first perform adversarial prompt search, employing reinforcement learning strategies to automatically identify adversarial prompts.<n>We then conduct vision-aligned perturbation optimization to craft adversarial examples on input images, maximizing the similarity between the perturbed image's visual embeddings and those of the adversarial prompt.
arXiv Detail & Related papers (2025-11-20T09:03:43Z)
MTAttack: Multi-Target Backdoor Attacks against Large Vision-Language Models [52.37749859972453]
We propose MTAttack, the first multi-target backdoor attack framework for enforcing accurate multiple trigger-target mappings in LVLMs.<n> Experiments on popular benchmarks demonstrate a high success rate of MTAttack for multi-target attacks.<n>Our attack exhibits strong generalizability across datasets and robustness against backdoor defense strategies.
arXiv Detail & Related papers (2025-11-13T09:00:21Z)
One Token Embedding Is Enough to Deadlock Your Large Reasoning Model [91.48868589442837]
We present the Deadlock Attack, a resource exhaustion method that hijacks an LRM's generative control flow.<n>Our method achieves a 100% attack success rate across four advanced LRMs.
arXiv Detail & Related papers (2025-10-12T07:42:57Z)
SilentStriker:Toward Stealthy Bit-Flip Attacks on Large Language Models [13.200372347541142]
Bit-Flip Attacks (BFAs) exploit hardware vulnerabilities to corrupt model parameters and cause severe performance degradation.<n>Existing BFA methods fail to balance performance degradation and output naturalness, making them prone to discovery.<n>SilentStriker is the first stealthy bit-flip attack against LLMs that effectively degrades task performance while maintaining output naturalness.
arXiv Detail & Related papers (2025-09-22T05:36:18Z)
RECALLED: An Unbounded Resource Consumption Attack on Large Vision-Language Models [16.62034667623657]
Resource Consumption Attacks (RCAs) have emerged as a significant threat to the deployment of Large Language Models (LLMs)<n>We present RECALLED, the first approach for exploiting visual modalities to trigger RCAs red-teaming.<n>Our study exposes security vulnerabilities in LVLMs and establishes a red-teaming framework that can facilitate future defense development against RCAs.
arXiv Detail & Related papers (2025-07-24T02:58:16Z)
Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models [17.663550432103534]
Multimodal Large Language Models (MLLMs) extend the capacity of LLMs to understand multimodal information comprehensively. These models are susceptible to jailbreak attacks, where malicious users can break the safety alignment of the target model and generate misleading and harmful answers. We propose Cross-modality Information DEtectoR (CIDER), a plug-and-play jailbreaking detector designed to identify maliciously perturbed image inputs.
arXiv Detail & Related papers (2024-07-31T15:02:46Z)
White-box Multimodal Jailbreaks Against Large Vision-Language Models [61.97578116584653]
We propose a more comprehensive strategy that jointly attacks both text and image modalities to exploit a broader spectrum of vulnerability within Large Vision-Language Models. Our attack method begins by optimizing an adversarial image prefix from random noise to generate diverse harmful responses in the absence of text input. An adversarial text suffix is integrated and co-optimized with the adversarial image prefix to maximize the probability of eliciting affirmative responses to various harmful instructions.
arXiv Detail & Related papers (2024-05-28T07:13:30Z)
Energy-Latency Manipulation of Multi-modal Large Language Models via Verbose Samples [63.9198662100875]
In this paper, we aim to induce high energy-latency cost during inference by crafting an imperceptible perturbation. We find that high energy-latency cost can be manipulated by maximizing the length of generated sequences. Experiments demonstrate that our verbose samples can largely extend the length of generated sequences.
arXiv Detail & Related papers (2024-04-25T12:11:38Z)
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models [65.23688155159398]
Autoregressive Visual Language Models (VLMs) showcase impressive few-shot learning capabilities in a multimodal context. Recently, multimodal instruction tuning has been proposed to further enhance instruction-following abilities. Adversaries can implant a backdoor by injecting poisoned samples with triggers embedded in instructions or images. We propose a multimodal instruction backdoor attack, namely VL-Trojan.
arXiv Detail & Related papers (2024-02-21T14:54:30Z)
Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images [63.91986621008751]
Large vision-language models (VLMs) have achieved exceptional performance across various multi-modal tasks. In this paper, we aim to induce high energy-latency cost during inference ofVLMs. We propose verbose images, with the goal of crafting an imperceptible perturbation to induce VLMs to generate long sentences.
arXiv Detail & Related papers (2024-01-20T08:46:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.