ExtendAttack: Attacking Servers of LRMs via Extending Reasoning
- URL: http://arxiv.org/abs/2506.13737v1
- Date: Mon, 16 Jun 2025 17:49:05 GMT
- Title: ExtendAttack: Attacking Servers of LRMs via Extending Reasoning
- Authors: Zhenhao Zhu, Yue Liu, Yingwei Ma, Hongcheng Gao, Nuo Chen, Yanpei Guo, Wenjie Qu, Huiying Xu, Xinzhong Zhu, Jiaheng Zhang,
- Abstract summary: Large Reasoning Models (LRMs) have demonstrated promising performance in complex tasks.<n>We propose a novel attack method on LRMs termed ExtendAttack to maliciously occupy the resources of servers.<n>We show that ExtendAttack increases the length of the model's response by over 2.5 times for the o3 model on the HumanEval benchmark.
- Score: 27.205747119390846
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Reasoning Models (LRMs) have demonstrated promising performance in complex tasks. However, the resource-consuming reasoning processes may be exploited by attackers to maliciously occupy the resources of the servers, leading to a crash, like the DDoS attack in cyber. To this end, we propose a novel attack method on LRMs termed ExtendAttack to maliciously occupy the resources of servers by stealthily extending the reasoning processes of LRMs. Concretely, we systematically obfuscate characters within a benign prompt, transforming them into a complex, poly-base ASCII representation. This compels the model to perform a series of computationally intensive decoding sub-tasks that are deeply embedded within the semantic structure of the query itself. Extensive experiments demonstrate the effectiveness of our proposed ExtendAttack. Remarkably, it increases the length of the model's response by over 2.5 times for the o3 model on the HumanEval benchmark. Besides, it preserves the original meaning of the query and achieves comparable answer accuracy, showing the stealthiness.
Related papers
- BadReasoner: Planting Tunable Overthinking Backdoors into Large Reasoning Models for Fun or Profit [12.189197763012409]
Large language models (LRMs) have emerged as a significant advancement in artificial intelligence.<n>In this paper, we identify an unexplored attack vector against LRMs, which we term "overthinking tunables"<n>We propose a novel tunable backdoor, which moves beyond simple on/off attacks to one where an attacker can precisely control the extent of the model's reasoning verbosity.
arXiv Detail & Related papers (2025-07-24T11:24:35Z) - Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models [17.860698041523918]
Contextual priming, where earlier stimuli covertly bias later judgments, offers an unexplored attack surface for large language models (LLMs)<n>We propose Response Attack, which uses an auxiliary LLM to generate a mildly harmful response to a paraphrased version of the original malicious query.<n> RA consistently outperforms seven state-of-the-art jailbreak techniques, achieving higher attack success rates.
arXiv Detail & Related papers (2025-07-07T17:56:05Z) - A Reward-driven Automated Webshell Malicious-code Generator for Red-teaming [0.0]
There is a significant shortage of publicly available, well-categorized malicious-code datasets organized by obfuscation method.<n>Existing malicious-code generation methods, which primarily rely on prompt engineering, often suffer from limited diversity and high redundancy in the payloads they produce.<n>We propose textbfRAWG, a textbfReward-driven textbfAutomated textbfWebshell Malicious-code textbfGenerator designed for red-teaming applications.
arXiv Detail & Related papers (2025-05-30T06:16:42Z) - Practical Reasoning Interruption Attacks on Reasoning Large Language Models [0.24963930962128378]
Reasoning large language models (RLLMs) have demonstrated outstanding performance across a variety of tasks, yet they also expose numerous security vulnerabilities.<n>Recent work has identified a distinct "thinking-stopped" vulnerability in DeepSeek-R1 under adversarial prompts.<n>We develop a novel prompt injection attack, termed reasoning interruption attack, and offer an initial analysis of its root cause.
arXiv Detail & Related papers (2025-05-10T13:36:01Z) - AGENTFUZZER: Generic Black-Box Fuzzing for Indirect Prompt Injection against LLM Agents [54.29555239363013]
We propose a generic black-box fuzzing framework, AgentFuzzer, to automatically discover and exploit indirect prompt injection vulnerabilities.<n>We evaluate AgentFuzzer on two public benchmarks, AgentDojo and VWA-adv, where it achieves 71% and 70% success rates against agents based on o3-mini and GPT-4o.<n>We apply our attacks in real-world environments, successfully misleading agents to navigate to arbitrary URLs, including malicious sites.
arXiv Detail & Related papers (2025-05-09T07:40:17Z) - Token-Efficient Prompt Injection Attack: Provoking Cessation in LLM Reasoning via Adaptive Token Compression [12.215295420714787]
"Reasoning Interruption Attack" is a prompt injection attack based on adaptive token compression.<n>We develop a systematic approach to efficiently collect attack prompts and an adaptive token compression framework.<n> Experiments show our compression framework significantly reduces prompt length while maintaining effective attack capabilities.
arXiv Detail & Related papers (2025-04-29T07:34:22Z) - To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models [56.19026073319406]
Large Reasoning Models (LRMs) are designed to solve complex tasks by generating explicit reasoning traces before producing final answers.<n>We reveal a critical vulnerability in LRMs -- termed Unthinking -- wherein the thinking process can be bypassed by manipulating special tokens.<n>In this paper, we investigate this vulnerability from both malicious and beneficial perspectives.
arXiv Detail & Related papers (2025-02-16T10:45:56Z) - Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models [53.580928907886324]
Reasoning-Augmented Conversation is a novel multi-turn jailbreak framework.<n>It reformulates harmful queries into benign reasoning tasks.<n>We show that RACE achieves state-of-the-art attack effectiveness in complex conversational scenarios.
arXiv Detail & Related papers (2025-02-16T09:27:44Z) - MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents [60.30753230776882]
LLM agents are vulnerable to indirect prompt injection (IPI) attacks, where malicious tasks embedded in tool-retrieved information can redirect the agent to take unauthorized actions.<n>We present MELON, a novel IPI defense that detects attacks by re-executing the agent's trajectory with a masked user prompt modified through a masking function.
arXiv Detail & Related papers (2025-02-07T18:57:49Z) - Turning Logic Against Itself : Probing Model Defenses Through Contrastive Questions [51.51850981481236]
We introduce POATE, a novel jailbreak technique that harnesses contrastive reasoning to provoke unethical responses.<n>PoATE crafts semantically opposing intents and integrates them with adversarial templates, steering models toward harmful outputs with remarkable subtlety.<n>To counter this, we propose Intent-Aware CoT and Reverse Thinking CoT, which decompose queries to detect malicious intent and reason in reverse to evaluate and reject harmful responses.
arXiv Detail & Related papers (2025-01-03T15:40:03Z) - Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.<n>We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.<n>We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z) - Phantom: General Trigger Attacks on Retrieval Augmented Language Generation [30.63258739968483]
Retrieval Augmented Generation (RAG) expands the capabilities of modern large language models (LLMs)
We propose new attack vectors that allow an adversary to inject a single malicious document into a RAG system's knowledge base, and mount a backdoor poisoning attack.
We demonstrate our attacks on multiple LLM architectures, including Gemma, Vicuna, and Llama, and show that they transfer to GPT-3.5 Turbo and GPT-4.
arXiv Detail & Related papers (2024-05-30T21:19:24Z) - LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence [68.27280750612204]
Recent embodied agents are primarily built based on reinforcement learning (RL) or large language models (LLMs)<n>In this work, we combine their advantages while avoiding the drawbacks by conducting the proposed referee RL on our developed large auto-regressive model (LARM)<n>Specifically, LARM is built upon a lightweight LLM (fewer than 5B parameters) and directly outputs the next action to execute rather than text.
arXiv Detail & Related papers (2024-05-27T17:59:32Z) - Black-box Adversarial Attacks against Dense Retrieval Models: A
Multi-view Contrastive Learning Method [115.29382166356478]
We introduce the adversarial retrieval attack (AREA) task.
It is meant to trick DR models into retrieving a target document that is outside the initial set of candidate documents retrieved by the DR model.
We find that the promising results that have previously been reported on attacking NRMs, do not generalize to DR models.
We propose to formalize attacks on DR models as a contrastive learning problem in a multi-view representation space.
arXiv Detail & Related papers (2023-08-19T00:24:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.