Related papers: First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models

First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models

URL: http://arxiv.org/abs/2408.11393v1
Date: Wed, 21 Aug 2024 07:38:51 GMT
Title: First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models
Authors: Chi Ma, Mincong Huang, Ying Zhang, Chao Wang, Yujie Wang, Lei Yu, Chuan Liu, Wei Lin,
Abstract summary: This paper introduces a training-free Threshold-based Dynamic Activation method that leverage sequence information to exploit the inherent sparsity of models across various architectures. We theoretically analyze two of its critical features: history-related activation uncertainty and semantic-irrelevant activation inertia.
Score: 25.15698344467722
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Dynamic activation (DA) techniques, such as DejaVu and MoEfication, have demonstrated their potential to significantly enhance the inference efficiency of large language models (LLMs). However, these techniques often rely on ReLU activation functions or require additional parameters and training to maintain performance. This paper introduces a training-free Threshold-based Dynamic Activation(TDA) method that leverage sequence information to exploit the inherent sparsity of models across various architectures. This method is designed to accelerate generation speed by 18-25\% without significantly compromising task performance, thereby addressing the limitations of existing DA techniques. Moreover, we delve into the root causes of LLM sparsity and theoretically analyze two of its critical features: history-related activation uncertainty and semantic-irrelevant activation inertia. Our comprehensive analyses not only provide a robust theoretical foundation for DA methods but also offer valuable insights to guide future research in optimizing LLMs for greater efficiency and effectiveness.

Related papers

Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement [101.77467538102924]
Large reasoning models (LRMs) exhibit overthinking, which hinders efficiency and inflates inference cost.<n>We propose two lightweight methods to enhance LRM efficiency.<n>First, we introduce Efficiency Steering, a training-free activation steering technique that modulates reasoning behavior via a single direction.<n>Second, we develop Self-Rewarded Efficiency RL, a reinforcement learning framework that dynamically balances task accuracy and brevity.
arXiv Detail & Related papers (2025-06-18T17:18:12Z)
WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference [44.538579135121466]
WINA (Weight Informed Neuron Activation) is a novel, simple, and training-free sparse activation framework.<n>We show that WINA obtains optimal approximation error bounds with theoretical guarantees tighter than existing techniques.<n>It also outperforms state-of-the-art methods (e.g., TEAL) by up to $2.94%$ in average performance at the same sparsity levels.
arXiv Detail & Related papers (2025-05-26T02:37:32Z)
Activation Control for Efficiently Eliciting Long Chain-of-thought Ability of Language Models [45.938663388013445]
We show that a small set of high-impact activations in the last few layers governs long-form reasoning attributes.<n>By simply amplifying these activations and inserting "wait" tokens, we can invoke the long CoT ability without any training.
arXiv Detail & Related papers (2025-05-23T10:07:18Z)
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference [77.47238561728459]
R-Sparse is a training-free activation sparsity approach capable of achieving high sparsity levels in advanced LLMs. Experiments on Llama-2/3 and Mistral models across ten diverse tasks demonstrate that R-Sparse achieves comparable performance at 50% model-level sparsity.
arXiv Detail & Related papers (2025-04-28T03:30:32Z)
Self-Controlled Dynamic Expansion Model for Continual Learning [10.447232167638816]
This paper introduces an innovative Self-Controlled Dynamic Expansion Model (SCDEM) SCDEM orchestrates multiple trainable pre-trained ViT backbones to furnish diverse and semantically enriched representations. An extensive series of experiments have been conducted to evaluate the proposed methodology's efficacy.
arXiv Detail & Related papers (2025-04-14T15:22:51Z)
DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs [70.91804882618243]
This paper proposes DSMoE, a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks. We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge. Experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches.
arXiv Detail & Related papers (2025-02-18T02:37:26Z)
Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors [8.761404991620285]
Activation intervention has emerged as an effective and economical method to modify the behavior of large language models (LLMs) We propose Semantics-Adaptive Dynamic Intervention (SADI), a novel method that constructs a dynamic steering vector to intervene model activations at inference time. Experimental results show that SADI outperforms established baselines by substantial margins, improving task performance without training.
arXiv Detail & Related papers (2024-10-16T06:58:49Z)
MOYU: A Theoretical Study on Massive Over-activation Yielded Uplifts in LLMs [20.404448253054014]
Massive Over-activation Yielded Uplifts(MOYU) is an inherent property of large language models. Massive Over-activation Yielded Uplifts(MOYU) is a clever yet under-explored strategy designed to accelerate inference in these models.
arXiv Detail & Related papers (2024-06-18T12:57:33Z)
Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA) Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning. We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z)
A Method on Searching Better Activation Functions [15.180864683908878]
We propose Entropy-based Activation Function Optimization (EAFO) methodology for designing static activation functions in deep neural networks. We derive a novel activation function from ReLU, known as Correction Regularized ReLU (CRReLU)
arXiv Detail & Related papers (2024-05-19T03:48:05Z)
Dynamic Activation Pitfalls in LLaMA Models: An Empirical Study [20.404448253054014]
We investigate the efficacy of dynamic activation mechanisms within the LLaMA family of language models. Our empirical findings have uncovered several inherent pitfalls in the current dynamic activation schemes.
arXiv Detail & Related papers (2024-05-15T11:42:42Z)
DiffTORI: Differentiable Trajectory Optimization for Deep Reinforcement and Imitation Learning [19.84386060857712]
This paper introduces DiffTORI, which utilizes Differentiable Trajectory optimization as the policy representation to generate actions for deep Reinforcement and Imitation learning. Across 15 model-based RL tasks and 35 imitation learning tasks with high-dimensional image and point cloud inputs, DiffTORI outperforms prior state-of-the-art methods in both domains.
arXiv Detail & Related papers (2024-02-08T05:26:40Z)
When Parameter-efficient Tuning Meets General-purpose Vision-language Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique. Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z)
Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models [109.06052781040916]
We introduce a technique to enhance the inference efficiency of parameter-shared language models. We also propose a simple pre-training technique that leads to fully or partially shared models. Results demonstrate the effectiveness of our methods on both autoregressive and autoencoding PLMs.
arXiv Detail & Related papers (2023-10-19T15:13:58Z)
Model-Based Reinforcement Learning with Multi-Task Offline Pretraining [59.82457030180094]
We present a model-based RL method that learns to transfer potentially useful dynamics and action demonstrations from offline data to a novel task. The main idea is to use the world models not only as simulators for behavior learning but also as tools to measure the task relevance. We demonstrate the advantages of our approach compared with the state-of-the-art methods in Meta-World and DeepMind Control Suite.
arXiv Detail & Related papers (2023-06-06T02:24:41Z)
Hierarchical Optimization-Derived Learning [58.69200830655009]
We establish a new framework, named Hierarchical ODL (HODL), to simultaneously investigate the intrinsic behaviors of optimization-derived model construction and its corresponding learning process. This is the first theoretical guarantee for these two coupled ODL components: optimization and learning.
arXiv Detail & Related papers (2023-02-11T03:35:13Z)
Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning. We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle. In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.