Related papers: Resting Neurons, Active Insights: Improving Input Sparsification for Large Language Models

Resting Neurons, Active Insights: Improving Input Sparsification for Large Language Models

URL: http://arxiv.org/abs/2512.12744v1
Date: Sun, 14 Dec 2025 15:47:40 GMT
Title: Resting Neurons, Active Insights: Improving Input Sparsification for Large Language Models
Authors: Haotian Xu, Tian Gao, Tsui-Wei Weng, Tengfei Ma,
Abstract summary: Large Language Models (LLMs) achieve state-of-the-art performance across a wide range of applications.<n>Structured pruning, which reduces model size by removing redundant computational units such as neurons, has been widely explored as a solution.<n>This study devotes to input sparsification, an increasingly popular technique that improves efficiency by selectively activating only a subset of entry values for each input.
Score: 42.12574676719046
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) achieve state-of-the-art performance across a wide range of applications, but their massive scale poses significant challenges for both efficiency and interpretability. Structural pruning, which reduces model size by removing redundant computational units such as neurons, has been widely explored as a solution, and this study devotes to input sparsification, an increasingly popular technique that improves efficiency by selectively activating only a subset of entry values for each input. However, existing approaches focus primarily on computational savings, often overlooking the representational consequences of sparsification and leaving a noticeable performance gap compared to full models. In this work, we first reinterpret input sparsification as a form of dynamic structural pruning. Motivated by the spontaneous baseline firing rates observed in biological neurons, we introduce a small set of trainable spontaneous neurons that act as compensatory units to stabilize activations in sparsified LLMs. Experiments demonstrate that these auxiliary neurons substantially reduce the sparsification-induced performance gap while generalizing effectively across tasks.

Related papers

General Self-Prediction Enhancement for Spiking Neurons [71.01912385372577]
Spiking Neural Networks (SNNs) are highly energy-efficient due to event-driven, sparse computation, but their training is challenged by spike non-differentiability and trade-offs among performance, efficiency, and biological plausibility.<n>We propose a self-prediction enhanced spiking neuron method that generates an internal prediction current from its input-output history to modulate membrane potential.<n>This design offers dual advantages, it creates a continuous gradient path that alleviates vanishing gradients and boosts training stability and accuracy, while also aligning with biological principles, which resembles distal dendritic modulation and error-driven synaptic plasticity.
arXiv Detail & Related papers (2026-01-29T15:08:48Z)
Repulsor: Accelerating Generative Modeling with a Contrastive Memory Bank [65.00301565190824]
mname is a plug-and-play training framework that requires no external encoders.<n>mname achieves a state-of-the-art FID of textbf2.40 within 400k steps, significantly outperforming comparable methods.
arXiv Detail & Related papers (2025-12-09T14:39:26Z)
A Sparsity Predicting Approach for Large Language Models via Activation Pattern Clustering [3.485125799252057]
Large Language Models (LLMs) exhibit significant activation sparsity, where only a subset of neurons are active for a given input.<n>Direct prediction at the neuron level is computationally expensive due to the vast number of neurons in modern LLMs.<n>We propose a clustering-based activation pattern compression framework to enable efficient prediction and utilization of activation sparsity.
arXiv Detail & Related papers (2025-07-11T19:07:29Z)
Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement [101.77467538102924]
Large reasoning models (LRMs) exhibit overthinking, which hinders efficiency and inflates inference cost.<n>We propose two lightweight methods to enhance LRM efficiency.<n>First, we introduce Efficiency Steering, a training-free activation steering technique that modulates reasoning behavior via a single direction.<n>Second, we develop Self-Rewarded Efficiency RL, a reinforcement learning framework that dynamically balances task accuracy and brevity.
arXiv Detail & Related papers (2025-06-18T17:18:12Z)
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference [77.47238561728459]
R-Sparse is a training-free activation sparsity approach capable of achieving high sparsity levels in advanced LLMs.<n> Experiments on Llama-2/3 and Mistral models across ten diverse tasks demonstrate that R-Sparse achieves comparable performance at 50% model-level sparsity.
arXiv Detail & Related papers (2025-04-28T03:30:32Z)
Small Contributions, Small Networks: Efficient Neural Network Pruning Based on Relative Importance [25.579863542008646]
We introduce an intuitive and interpretable pruning method based on activation statistics. We build a distribution of weight contributions across the dataset and utilize its parameters to guide the pruning process. Our method consistently outperforms several baseline and state-of-the-art pruning techniques.
arXiv Detail & Related papers (2024-10-21T16:18:31Z)
CHESS: Optimizing LLM Inference via Channel-Wise Thresholding and Selective Sparsification [7.8430836312711465]
This paper reformulates the activation sparsification problem to explicitly capture the relationship between activation sparsity and model performance.<n>We propose CHESS, a general activation sparsification approach via CHannel-wise thrEsholding and Selective Sparsification.<n> Experimental results demonstrate that the proposed CHESS achieves lower performance degradation over eight downstream tasks while activating fewer parameters than existing methods.
arXiv Detail & Related papers (2024-09-02T16:41:44Z)
Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons [27.289945121113277]
We introduce DemP, a method that controls the proliferation of dead neurons, dynamically leading to sparsity. Experiments on CIFAR10 and ImageNet datasets demonstrate superior accuracy-sparsity tradeoffs.
arXiv Detail & Related papers (2024-03-12T14:28:06Z)
NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models [35.10729451729596]
Transformer-based Language Models have become ubiquitous in Natural Language Processing (NLP) However, expensive training as well as inference remains a significant impediment to their widespread applicability. Inspired by brain neuronal networks, we explore sparsity approaches through the lens of network topology.
arXiv Detail & Related papers (2024-02-28T22:21:47Z)
Towards Efficient Processing and Learning with Spikes: New Approaches for Multi-Spike Learning [59.249322621035056]
We propose two new multi-spike learning rules which demonstrate better performance over other baselines on various tasks. In the feature detection task, we re-examine the ability of unsupervised STDP with its limitations being presented. Our proposed learning rules can reliably solve the task over a wide range of conditions without specific constraints being applied.
arXiv Detail & Related papers (2020-05-02T06:41:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.