Meaningless Tokens, Meaningful Gains: How Activation Shifts Enhance LLM Reasoning
- URL: http://arxiv.org/abs/2510.01032v1
- Date: Wed, 01 Oct 2025 15:39:38 GMT
- Title: Meaningless Tokens, Meaningful Gains: How Activation Shifts Enhance LLM Reasoning
- Authors: Zeru Shi, Yingjia Wan, Zhenting Wang, Qifan Wang, Fan Yang, Elisa Kreiss, Ruixiang Tang,
- Abstract summary: Motivated by the puzzling observation that inserting long sequences of meaningless tokens before the query prompt can consistently enhance reasoning LLM performance, this work analyzes the underlying mechanism driving this phenomenon.<n>We find that the improvements arise from a redistribution of activations in the LLM's layers, where near zero activations become less frequent while large magnitude activations increase.<n>We propose a lightweight inference-time technique that modifies activations directly without altering the input sequence.
- Score: 53.35553353785948
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Motivated by the puzzling observation that inserting long sequences of meaningless tokens before the query prompt can consistently enhance LLM reasoning performance, this work analyzes the underlying mechanism driving this phenomenon and based on these insights proposes a more principled method that allows for similar performance gains. First, we find that the improvements arise from a redistribution of activations in the LLM's MLP layers, where near zero activations become less frequent while large magnitude activations increase. This redistribution enhances the model's representational capacity by suppressing weak signals and promoting stronger, more informative ones. Building on this insight, we propose the Activation Redistribution Module (ARM), a lightweight inference-time technique that modifies activations directly without altering the input sequence. ARM adaptively identifies near-zero activations after the non-linear function and shifts them outward, implicitly reproducing the beneficial effects of meaningless tokens in a controlled manner. Extensive experiments across diverse benchmarks and model architectures clearly show that ARM consistently improves LLM performance on reasoning tasks while requiring only a few lines of simple code to implement. Our findings deliver both a clear mechanistic explanation for the unexpected benefits of meaningless tokens and a simple yet effective technique that harnesses activation redistribution to further improve LLM performance.
Related papers
- FreeAct: Freeing Activations for LLM Quantization [89.97086263978058]
Quantization is pivotal for mitigating the significant memory and computational overhead of Large Language Models.<n>FreeAct is a novel quantization framework that relaxes the static one-to-one constraint to accommodate dynamic activation disparities.<n>Experiments across dLLMs and MLLMs demonstrate that FreeAct significantly outperforms baselines, up to 5.3% performance improvement.
arXiv Detail & Related papers (2026-03-02T12:02:17Z) - Boosting Reasoning in Large Multimodal Models via Activation Replay [136.6522463570943]
We show how RLVR shifts low-entropy activations unexpectedly, while high-entropy ones are less affected.<n>We propose Activation Replay, a training-free approach that boosts multimodal reasoning of post-trained LMMs.
arXiv Detail & Related papers (2025-11-25T06:31:57Z) - A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models [85.30893355216486]
We study how visual token redundancy evolves with different dMLLM architectures and tasks.<n>Our study reveals that visual redundancy emerges only in from-scratch dMLLMs while handling long-answer tasks.<n>Layer-skipping is promising for accelerating AR-to-diffusion dMLLMs, whereas progressive or late-step pruning is more effective for from-scratch dMLLMs.
arXiv Detail & Related papers (2025-11-19T04:13:36Z) - Revisiting LLM Reasoning via Information Bottleneck [57.519119962528166]
Large language models (LLMs) have recently demonstrated remarkable progress in reasoning capabilities through reinforcement learning with verifiable rewards (RLVR)<n>We present a theoretical characterization of LLM reasoning grounded in information bottleneck (IB) principle.<n>We propose IB-aware reasoning optimization (IBRO), a framework that encourages reasoning trajectories to be both informative about the final correct answer and generalizable.
arXiv Detail & Related papers (2025-07-24T13:14:25Z) - Activation Control for Efficiently Eliciting Long Chain-of-thought Ability of Language Models [45.938663388013445]
We show that a small set of high-impact activations in the last few layers governs long-form reasoning attributes.<n>By simply amplifying these activations and inserting "wait" tokens, we can invoke the long CoT ability without any training.
arXiv Detail & Related papers (2025-05-23T10:07:18Z) - Sparsing Law: Towards Large Language Models with Greater Activation Sparsity [64.15238674475619]
Activation sparsity denotes the existence of substantial weakly-contributed elements within activation outputs that can be eliminated.<n>We propose PPL-$p%$ sparsity, a precise and performance-aware activation sparsity metric.<n>We show that ReLU is more efficient as the activation function than SiLU and can leverage more training data to improve activation sparsity.
arXiv Detail & Related papers (2024-11-04T17:59:04Z) - CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification [7.8430836312711465]
This paper reformulates the activation sparsification problem to explicitly capture the relationship between activation sparsity and model performance.<n>We propose CHESS, a general activation sparsification approach via CHannel-wise thrEsholding and Selective Sparsification.<n> Experimental results demonstrate that the proposed CHESS achieves lower performance degradation over eight downstream tasks while activating fewer parameters than existing methods.
arXiv Detail & Related papers (2024-09-02T16:41:44Z) - Extending Token Computation for LLM Reasoning [5.801044612920816]
Large Language Models (LLMs) are pivotal in advancing natural language processing.
LLMs often struggle with complex reasoning tasks due to inefficient attention distributions.
We introduce a novel method for extending computed tokens in the Chain-of-Thought process, utilizing attention mechanism optimization.
arXiv Detail & Related papers (2024-03-22T03:23:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.