Related papers: One Token Embedding Is Enough to Deadlock Your Large Reasoning Model

One Token Embedding Is Enough to Deadlock Your Large Reasoning Model

URL: http://arxiv.org/abs/2510.15965v1
Date: Sun, 12 Oct 2025 07:42:57 GMT
Title: One Token Embedding Is Enough to Deadlock Your Large Reasoning Model
Authors: Mohan Zhang, Yihua Zhang, Jinghan Jia, Zhangyang Wang, Sijia Liu, Tianlong Chen,
Abstract summary: We present the Deadlock Attack, a resource exhaustion method that hijacks an LRM's generative control flow.<n>Our method achieves a 100% attack success rate across four advanced LRMs.
Score: 91.48868589442837
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern large reasoning models (LRMs) exhibit impressive multi-step problem-solving via chain-of-thought (CoT) reasoning. However, this iterative thinking mechanism introduces a new vulnerability surface. We present the Deadlock Attack, a resource exhaustion method that hijacks an LRM's generative control flow by training a malicious adversarial embedding to induce perpetual reasoning loops. Specifically, the optimized embedding encourages transitional tokens (e.g., "Wait", "But") after reasoning steps, preventing the model from concluding its answer. A key challenge we identify is the continuous-to-discrete projection gap: na\"ive projections of adversarial embeddings to token sequences nullify the attack. To overcome this, we introduce a backdoor implantation strategy, enabling reliable activation through specific trigger tokens. Our method achieves a 100% attack success rate across four advanced LRMs (Phi-RM, Nemotron-Nano, R1-Qwen, R1-Llama) and three math reasoning benchmarks, forcing models to generate up to their maximum token limits. The attack is also stealthy (in terms of causing negligible utility loss on benign user inputs) and remains robust against existing strategies trying to mitigate the overthinking issue. Our findings expose a critical and underexplored security vulnerability in LRMs from the perspective of reasoning (in)efficiency.

Related papers

BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models [24.513640096951566]
We propose BadThink, the first backdoor attack designed to deliberately induce "overthinking" behavior in large language models.<n>When activated by carefully crafted trigger prompts, BadThink manipulates the model to generate inflated reasoning traces.<n>We implement this attack through a sophisticated poisoning-based fine-tuning strategy.
arXiv Detail & Related papers (2025-11-13T13:44:51Z)
Distractor Injection Attacks on Large Reasoning Models: Characterization and Defense [16.519353449118814]
We analyze a critical vulnerability we term reasoning distraction, where LRMs are diverted from their primary objective by irrelevant yet complex tasks maliciously embedded in the prompt.<n>We show that even state-of-the-art LRMs are highly susceptible, with injected distractors reducing task accuracy by up to 60%.<n>We propose a training-based defense that combines Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) on synthetic adversarial data, improving robustness by over 50 points on challenging distractor attacks.
arXiv Detail & Related papers (2025-10-17T23:16:34Z)
Bag of Tricks for Subverting Reasoning-based Safety Guardrails [62.139297207938036]
We present a bag of jailbreak methods that subvert the reasoning-based guardrails.<n>Our attacks span white-, gray-, and black-box settings and range from effortless template manipulations to fully automated optimization.
arXiv Detail & Related papers (2025-10-13T16:16:44Z)
MARCOS: Deep Thinking by Markov Chain of Continuous Thoughts [82.46857666702924]
We present a new paradigm for reasoning in large language models (LLMs)<n>Instead of autoregressively generating tokens, we model reasoning as a hidden Markov chain of continuous, high-dimensional "thoughts"<n>For the first time, MARCOS achieves performance comparable to token-based CoT, even surpassing it by 4.7% on GSM8K with up to 15.7x speedup in inference.
arXiv Detail & Related papers (2025-09-29T16:44:22Z)
BadReasoner: Planting Tunable Overthinking Backdoors into Large Reasoning Models for Fun or Profit [12.189197763012409]
Large language models (LRMs) have emerged as a significant advancement in artificial intelligence.<n>In this paper, we identify an unexplored attack vector against LRMs, which we term "overthinking tunables"<n>We propose a novel tunable backdoor, which moves beyond simple on/off attacks to one where an attacker can precisely control the extent of the model's reasoning verbosity.
arXiv Detail & Related papers (2025-07-24T11:24:35Z)
Do Thinking Tokens Help or Trap? Towards More Efficient Large Reasoning Model [7.8354921036790275]
Large Reasoning Models (LRMs) excel at solving complex problems but face an overthinking dilemma.<n>When handling simple tasks, they often produce verbose responses overloaded with thinking tokens.<n>These tokens trigger unnecessary high-level reasoning behaviors like reflection and backtracking, reducing efficiency.
arXiv Detail & Related papers (2025-06-30T13:30:33Z)
On Reasoning Strength Planning in Large Reasoning Models [50.61816666920207]
We find evidence that LRMs pre-plan the reasoning strengths in their activations even before generation.<n>We then uncover that LRMs encode this reasoning strength through a pre-allocated directional vector embedded in the activations of the model.<n>Our work provides new insights into the internal mechanisms of reasoning in LRMs and offers practical tools for controlling their reasoning behaviors.
arXiv Detail & Related papers (2025-06-10T02:55:13Z)
Practical Reasoning Interruption Attacks on Reasoning Large Language Models [0.24963930962128378]
Reasoning large language models (RLLMs) have demonstrated outstanding performance across a variety of tasks, yet they also expose numerous security vulnerabilities.<n>Recent work has identified a distinct "thinking-stopped" vulnerability in DeepSeek-R1 under adversarial prompts.<n>We develop a novel prompt injection attack, termed reasoning interruption attack, and offer an initial analysis of its root cause.
arXiv Detail & Related papers (2025-05-10T13:36:01Z)
To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models [56.19026073319406]
Large Reasoning Models (LRMs) are designed to solve complex tasks by generating explicit reasoning traces before producing final answers.<n>We reveal a critical vulnerability in LRMs -- termed Unthinking -- wherein the thinking process can be bypassed by manipulating special tokens.<n>In this paper, we investigate this vulnerability from both malicious and beneficial perspectives.
arXiv Detail & Related papers (2025-02-16T10:45:56Z)
Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure. We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.