Entropy-Based Block Pruning for Efficient Large Language Models
- URL: http://arxiv.org/abs/2504.03794v1
- Date: Fri, 04 Apr 2025 03:42:34 GMT
- Title: Entropy-Based Block Pruning for Efficient Large Language Models
- Authors: Liangwei Yang, Yuhui Xu, Juntao Tan, Doyen Sahoo, Silvio Savarese, Caiming Xiong, Huan Wang, Shelby Heinecke,
- Abstract summary: We propose an entropy-based pruning strategy to enhance efficiency while maintaining performance.<n> Empirical analysis reveals that the entropy of hidden representations decreases in the early blocks but progressively increases across most subsequent blocks.
- Score: 81.18339597023187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As large language models continue to scale, their growing computational and storage demands pose significant challenges for real-world deployment. In this work, we investigate redundancy within Transformer-based models and propose an entropy-based pruning strategy to enhance efficiency while maintaining performance. Empirical analysis reveals that the entropy of hidden representations decreases in the early blocks but progressively increases across most subsequent blocks. This trend suggests that entropy serves as a more effective measure of information richness within computation blocks. Unlike cosine similarity, which primarily captures geometric relationships, entropy directly quantifies uncertainty and information content, making it a more reliable criterion for pruning. Extensive experiments demonstrate that our entropy-based pruning approach surpasses cosine similarity-based methods in reducing model size while preserving accuracy, offering a promising direction for efficient model deployment.
Related papers
- Learned Image Compression with Dictionary-based Entropy Model [32.966391277632646]
Entropy model plays a key role in learned image compression.<n>Most existing methods employed hyper-prior and auto-regressive architectures to form their entropy models.<n>We propose a novel entropy model named Dictionary-based Cross Attention Entropy model.
arXiv Detail & Related papers (2025-04-01T07:43:10Z) - Shrink the longest: improving latent space isotropy with symplicial geometry [0.0]
We propose a novel regularization technique based on simplicial geometry to improve the isotropy of latent representations.<n>We demonstrate that the method leads to an increase in downstream performance while significantly lowering the anisotropy during fine-tuning.
arXiv Detail & Related papers (2025-01-09T18:44:10Z) - An Entropy-Based Test and Development Framework for Uncertainty Modeling in Level-Set Visualizations [2.5449631655313896]
We use an entropy calculation directly on ensemble data to establish an expected result.
We show that fewer bins in nonparametric histogram models are more effective whereas large numbers of bins in quantile models approach data accuracy.
arXiv Detail & Related papers (2024-09-13T00:31:16Z) - REMEDI: Corrective Transformations for Improved Neural Entropy Estimation [0.7488108981865708]
We introduce $textttREMEDI$ for efficient and accurate estimation of differential entropy.
Our approach demonstrates improvement across a broad spectrum of estimation tasks.
It can be naturally extended to information theoretic supervised learning models.
arXiv Detail & Related papers (2024-02-08T14:47:37Z) - Dynamic Kernel-Based Adaptive Spatial Aggregation for Learned Image
Compression [63.56922682378755]
We focus on extending spatial aggregation capability and propose a dynamic kernel-based transform coding.
The proposed adaptive aggregation generates kernel offsets to capture valid information in the content-conditioned range to help transform.
Experimental results demonstrate that our method achieves superior rate-distortion performance on three benchmarks compared to the state-of-the-art learning-based methods.
arXiv Detail & Related papers (2023-08-17T01:34:51Z) - Action Redundancy in Reinforcement Learning [54.291331971813364]
We show that transition entropy can be described by two terms; namely, model-dependent transition entropy and action redundancy.
Our results suggest that action redundancy is a fundamental problem in reinforcement learning.
arXiv Detail & Related papers (2021-02-22T19:47:26Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Slice Sampling for General Completely Random Measures [74.24975039689893]
We present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables.
The efficacy of the proposed algorithm is evaluated on several popular nonparametric models.
arXiv Detail & Related papers (2020-06-24T17:53:53Z) - Learning Context-Based Non-local Entropy Modeling for Image Compression [140.64888994506313]
In this paper, we propose a non-local operation for context modeling by employing the global similarity within the context.
The entropy model is further adopted as the rate loss in a joint rate-distortion optimization.
Considering that the width of the transforms is essential in training low distortion models, we finally produce a U-Net block in the transforms to increase the width with manageable memory consumption and time complexity.
arXiv Detail & Related papers (2020-05-10T13:28:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.