ReLU Strikes Back: Exploiting Activation Sparsity in Large Language
Models
- URL: http://arxiv.org/abs/2310.04564v1
- Date: Fri, 6 Oct 2023 20:01:33 GMT
- Title: ReLU Strikes Back: Exploiting Activation Sparsity in Large Language
Models
- Authors: Iman Mirzadeh, Keivan Alizadeh, Sachin Mehta, Carlo C Del Mundo, Oncel
Tuzel, Golnoosh Samei, Mohammad Rastegari, Mehrdad Farajtabar
- Abstract summary: Large Language Models (LLMs) with billions of parameters have drastically transformed AI applications.
Their demanding computation during inference has raised significant challenges for deployment on resource-constrained devices.
We demonstrate that using the ReLU activation function has a negligible impact on convergence and performance while significantly reducing computation and weight transfer.
- Score: 35.77063662562747
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Large Language Models (LLMs) with billions of parameters have drastically
transformed AI applications. However, their demanding computation during
inference has raised significant challenges for deployment on
resource-constrained devices. Despite recent trends favoring alternative
activation functions such as GELU or SiLU, known for increased computation,
this study strongly advocates for reinstating ReLU activation in LLMs. We
demonstrate that using the ReLU activation function has a negligible impact on
convergence and performance while significantly reducing computation and weight
transfer. This reduction is particularly valuable during the memory-bound
inference step, where efficiency is paramount. Exploring sparsity patterns in
ReLU-based LLMs, we unveil the reutilization of activated neurons for
generating new tokens and leveraging these insights, we propose practical
strategies to substantially reduce LLM inference computation up to three times,
using ReLU activations with minimal performance trade-offs.
Related papers
- Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration [70.09561665520043]
We propose a novel framework for multi-agent collaboration that introduces Reinforced Advantage feedback (ReAd) for efficient self-refinement of plans.
We provide theoretical analysis by extending advantage-weighted regression in reinforcement learning to multi-agent systems.
Experiments on Over-AI and a difficult variant of RoCoBench show that ReAd surpasses baselines in success rate, and also significantly decreases the interaction steps of agents.
arXiv Detail & Related papers (2024-05-23T08:33:19Z) - A Method on Searching Better Activation Functions [15.180864683908878]
We propose Entropy-based Activation Function Optimization (EAFO) methodology for designing static activation functions in deep neural networks.
We derive a novel activation function from ReLU, known as Correction Regularized ReLU (CRReLU)
arXiv Detail & Related papers (2024-05-19T03:48:05Z) - Dynamic Activation Pitfalls in LLaMA Models: An Empirical Study [20.404448253054014]
We investigate the efficacy of dynamic activation mechanisms within the LLaMA family of language models.
Our empirical findings have uncovered several inherent pitfalls in the current dynamic activation schemes.
arXiv Detail & Related papers (2024-05-15T11:42:42Z) - How Can LLM Guide RL? A Value-Based Approach [68.55316627400683]
Reinforcement learning (RL) has become the de facto standard practice for sequential decision-making problems by improving future acting policies with feedback.
Recent developments in large language models (LLMs) have showcased impressive capabilities in language understanding and generation, yet they fall short in exploration and self-improvement capabilities.
We develop an algorithm named LINVIT that incorporates LLM guidance as a regularization factor in value-based RL, leading to significant reductions in the amount of data needed for learning.
arXiv Detail & Related papers (2024-02-25T20:07:13Z) - ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models [74.59731375779934]
Activation sparsity refers to the existence of weakly-contributed elements among activation outputs.
This paper introduces a simple and effective sparsification method named "ProSparse" to push LLMs for higher activation sparsity.
arXiv Detail & Related papers (2024-02-21T03:58:49Z) - Learn To be Efficient: Build Structured Sparsity in Large Language Models [17.940183066850565]
Large Language Models (LLMs) have achieved remarkable success with their billion-level parameters, yet they incur high inference overheads.
Existing methods only focus on utilizing this naturally formed activation sparsity in a post-training setting.
We introduce a novel training algorithm, Learn-To-be-Efficient (LTE), designed to train efficiency-aware LLMs.
arXiv Detail & Related papers (2024-02-09T01:18:16Z) - ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse
LLMs [91.31204876440765]
We introduce a general method that defines neuron activation through neuron output magnitudes and a tailored magnitude threshold.
To find the most efficient activation function for sparse computation, we propose a systematic framework.
We conduct thorough experiments on LLMs utilizing different activation functions, including ReLU, SwiGLU, ReGLU, and ReLU$2$.
arXiv Detail & Related papers (2024-02-06T08:45:51Z) - Hybrid Reinforcement Learning for Optimizing Pump Sustainability in
Real-World Water Distribution Networks [55.591662978280894]
This article addresses the pump-scheduling optimization problem to enhance real-time control of real-world water distribution networks (WDNs)
Our primary objectives are to adhere to physical operational constraints while reducing energy consumption and operational costs.
Traditional optimization techniques, such as evolution-based and genetic algorithms, often fall short due to their lack of convergence guarantees.
arXiv Detail & Related papers (2023-10-13T21:26:16Z) - PEA: Improving the Performance of ReLU Networks for Free by Using
Progressive Ensemble Activations [0.0]
Novel activation functions have been proposed to improve the performance of neural networks.
We propose methods that can be used to improve the performance of ReLU networks by using these efficient novel activations during model training.
arXiv Detail & Related papers (2022-07-28T13:29:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.