Related papers: LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops

LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops

URL: http://arxiv.org/abs/2506.14493v1
Date: Tue, 17 Jun 2025 13:14:55 GMT
Title: LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops
Authors: Jiyuan Fu, Kaixun Jiang, Lingyi Hong, Jinglun Li, Haijing Guo, Dingkang Yang, Zhaoyu Chen, Wenqiang Zhang,
Abstract summary: We propose LingoLoop, an attack designed to induce MLLMs to generate excessively verbose sequences.<n>We find that the POS tag of a token strongly affects the likelihood of generating an EOS token.<n>We introduce a Generative Path Pruning Mechanism that limits the magnitude of hidden states, encouraging the model to produce persistent loops.
Score: 22.036024483035465
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal Large Language Models (MLLMs) have shown great promise but require substantial computational resources during inference. Attackers can exploit this by inducing excessive output, leading to resource exhaustion and service degradation. Prior energy-latency attacks aim to increase generation time by broadly shifting the output token distribution away from the EOS token, but they neglect the influence of token-level Part-of-Speech (POS) characteristics on EOS and sentence-level structural patterns on output counts, limiting their efficacy. To address this, we propose LingoLoop, an attack designed to induce MLLMs to generate excessively verbose and repetitive sequences. First, we find that the POS tag of a token strongly affects the likelihood of generating an EOS token. Based on this insight, we propose a POS-Aware Delay Mechanism to postpone EOS token generation by adjusting attention weights guided by POS information. Second, we identify that constraining output diversity to induce repetitive loops is effective for sustained generation. We introduce a Generative Path Pruning Mechanism that limits the magnitude of hidden states, encouraging the model to produce persistent loops. Extensive experiments demonstrate LingoLoop can increase generated tokens by up to 30 times and energy consumption by a comparable factor on models like Qwen2.5-VL-3B, consistently driving MLLMs towards their maximum generation limits. These findings expose significant MLLMs' vulnerabilities, posing challenges for their reliable deployment. The code will be released publicly following the paper's acceptance.

Related papers

DINGO: Constrained Inference for Diffusion LLMs [5.971462597321995]
Diffusion models lack the ability to provably enforce user-specified formal constraints.<n>We propose DINGO, a dynamic programming-based decoding strategy that is both efficient and provably distribution-preserving.
arXiv Detail & Related papers (2025-05-29T04:04:54Z)
BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models [79.36881186707413]
Multi-modal large language models (MLLMs) process multi-modal information, enabling them to generate responses to image-text inputs.<n> MLLMs have been incorporated into diverse multi-modal applications, such as autonomous driving and medical diagnosis, via plug-and-play without fine-tuning.<n>We propose BadToken, the first token-level backdoor attack to MLLMs.
arXiv Detail & Related papers (2025-03-20T10:39:51Z)
Not all tokens are created equal: Perplexity Attention Weighted Networks for AI generated text detection [49.15148871877941]
Next-token distribution outputs offer a theoretically appealing approach for detection of large language models (LLMs)<n>We propose the Perplexity Attention Weighted Network (PAWN), which uses the last hidden states of the LLM and positions to weight the sum of a series of features based on metrics from the next-token distribution across the sequence length.<n>PAWN shows competitive and even better performance in-distribution than the strongest baselines with a fraction of their trainable parameters.
arXiv Detail & Related papers (2025-01-07T17:00:49Z)
An Engorgio Prompt Makes Large Language Model Babble on [25.148096060828397]
Auto-regressive large language models (LLMs) have yielded impressive performance in many real-world tasks.<n>In this paper, we explore their vulnerability to inference cost attacks, where a malicious user crafts Engorgio prompts.<n>We design Engorgio, a novel methodology, to efficiently generate adversarial Engorgio prompts to affect the target LLM's service availability.
arXiv Detail & Related papers (2024-12-27T01:00:23Z)
FIRP: Faster LLM inference via future intermediate representation prediction [54.897493351694195]
FIRP generates multiple tokens instead of one at each decoding step. We conduct extensive experiments, showing a speedup ratio of 1.9x-3x in several models and datasets.
arXiv Detail & Related papers (2024-10-27T15:53:49Z)
Energy-Latency Manipulation of Multi-modal Large Language Models via Verbose Samples [63.9198662100875]
In this paper, we aim to induce high energy-latency cost during inference by crafting an imperceptible perturbation. We find that high energy-latency cost can be manipulated by maximizing the length of generated sequences. Experiments demonstrate that our verbose samples can largely extend the length of generated sequences.
arXiv Detail & Related papers (2024-04-25T12:11:38Z)
Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration [54.897493351694195]
We propose a novel parallel decoding approach, namely textithidden transfer, which decodes multiple successive tokens simultaneously in a single forward pass. In terms of acceleration metrics, we outperform all the single-model acceleration techniques, including Medusa and Self-Speculative decoding.
arXiv Detail & Related papers (2024-04-18T09:17:06Z)
Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images [63.91986621008751]
Large vision-language models (VLMs) have achieved exceptional performance across various multi-modal tasks. In this paper, we aim to induce high energy-latency cost during inference ofVLMs. We propose verbose images, with the goal of crafting an imperceptible perturbation to induce VLMs to generate long sentences.
arXiv Detail & Related papers (2024-01-20T08:46:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.