Related papers: How can representation dimension dominate structurally pruned LLMs?

How can representation dimension dominate structurally pruned LLMs?

URL: http://arxiv.org/abs/2503.04377v1
Date: Thu, 06 Mar 2025 12:28:59 GMT
Title: How can representation dimension dominate structurally pruned LLMs?
Authors: Mingxue Xu, Lisa Alazraki, Danilo P. Mandic,
Abstract summary: Pruning assumes a subnetwork exists in the original deep neural network.<n>It is unclear how the model performance varies with the different subnetwork extractions.
Score: 17.953689537875377
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pruning assumes a subnetwork exists in the original deep neural network, which can achieve comparative model performance with less computation than the original. However, it is unclear how the model performance varies with the different subnetwork extractions. In this paper, we choose the representation dimension (or embedding dimension, model dimension, the dimension of the residual stream in the relevant literature) as the entry point to this issue. We investigate the linear transformations in the LLM transformer blocks and consider a specific structured pruning approach, SliceGPT, to extract the subnetworks of different representation dimensions. We mechanistically analyse the activation flow during the model forward passes, and find the representation dimension dominates the linear transformations, model predictions, and, finally, the model performance. Explicit analytical relations are given to calculate the pruned model performance (perplexity and accuracy) without actual evaluation, and are empirically validated with Llama-3-8B-Instruct and Phi-3-mini-4k-Instruct.

Related papers

Random Matrix Theory for Deep Learning: Beyond Eigenvalues of Linear Models [51.85815025140659]
Modern Machine Learning (ML) and Deep Neural Networks (DNNs) often operate on high-dimensional data.<n>In particular, the proportional regime where the data dimension, sample size, and number of model parameters are all large gives rise to novel and sometimes counterintuitive behaviors.<n>This paper extends traditional Random Matrix Theory (RMT) beyond eigenvalue-based analysis of linear models to address the challenges posed by nonlinear ML models.
arXiv Detail & Related papers (2025-06-16T06:54:08Z)
Large Language Models are Locally Linear Mappings [0.0]
We map the inference operations of several open-weight large language models to an exactly equivalent linear system for an input sequence.<n>Despite their power and global nonlinearity, modern LLMs can be interpreted through nearly-exact locally linear decompositions.
arXiv Detail & Related papers (2025-05-30T07:08:33Z)
Latent Thought Models with Variational Bayes Inference-Time Computation [52.63299874322121]
Latent Thought Models (LTMs) incorporate explicit latent thought vectors that follow an explicit prior model in latent space.<n>LTMs demonstrate superior sample and parameter efficiency compared to autoregressive models and discrete diffusion models.
arXiv Detail & Related papers (2025-02-03T17:50:34Z)
Instruction-Following Pruning for Large Language Models [58.329978053711024]
We move beyond the traditional static pruning approach of determining a fixed pruning mask for a model.<n>In our method, the pruning mask is input-dependent and adapts dynamically based on the information described in a user instruction.<n>Our approach, termed "instruction-following pruning", introduces a sparse mask predictor that takes the user instruction as input and dynamically selects the most relevant model parameters for the given task.
arXiv Detail & Related papers (2025-01-03T20:19:14Z)
Optimization Insights into Deep Diagonal Linear Networks [10.395029724463672]
We study the implicit regularization properties of the gradient flow "algorithm" for estimating the parameters of a deep diagonal neural network.<n>Our main contribution is showing that this gradient flow induces a mirror flow dynamic on the model, meaning that it is biased towards a specific solution of the problem.
arXiv Detail & Related papers (2024-12-21T20:23:47Z)
Bilinear Convolution Decomposition for Causal RL Interpretability [0.0]
Efforts to interpret reinforcement learning (RL) models often rely on high-level techniques such as attribution or probing.<n>This work proposes replacing nonlinearities in convolutional neural networks (ConvNets) with bilinear variants, to produce a class of models for which these limitations can be addressed.<n>We show bilinear model variants perform comparably in model-free reinforcement learning settings, and give a side by side comparison on ProcGen environments.
arXiv Detail & Related papers (2024-12-01T19:32:04Z)
Scaling and renormalization in high-dimensional regression [72.59731158970894]
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning.
arXiv Detail & Related papers (2024-05-01T15:59:00Z)
A tree-based varying coefficient model [0.0]
The paper introduces a tree-based varying coefficient model (VCM) where the varying coefficients are modelled using the cyclic gradient boosting machine (CGBM) The dimension-wise early stopping not only reduces the risk of dimension-specific overfitting, but also reveals differences in model complexity across dimensions. The model is evaluated on the same simulated and real data examples as those used in Richman and W"uthrich (2023), and the results show that it produces results in terms of out of sample loss that are comparable to those of their neural network-based VCM called LocalGLMnet.
arXiv Detail & Related papers (2024-01-11T15:35:32Z)
Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO) MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts. Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z)
Git Re-Basin: Merging Models modulo Permutation Symmetries [3.5450828190071655]
We show how simple algorithms can be used to fit large networks in practice. We demonstrate the first (to our knowledge) demonstration of zero mode connectivity between independently trained models. We also discuss shortcomings in the linear mode connectivity hypothesis.
arXiv Detail & Related papers (2022-09-11T10:44:27Z)
Interpretations Steered Network Pruning via Amortized Inferred Saliency Maps [85.49020931411825]
Convolutional Neural Networks (CNNs) compression is crucial to deploying these models in edge devices with limited resources. We propose to address the channel pruning problem from a novel perspective by leveraging the interpretations of a model to steer the pruning process. We tackle this challenge by introducing a selector model that predicts real-time smooth saliency masks for pruned models.
arXiv Detail & Related papers (2022-09-07T01:12:11Z)
Dynamically-Scaled Deep Canonical Correlation Analysis [77.34726150561087]
Canonical Correlation Analysis (CCA) is a method for feature extraction of two views by finding maximally correlated linear projections of them. We introduce a novel dynamic scaling method for training an input-dependent canonical correlation model.
arXiv Detail & Related papers (2022-03-23T12:52:49Z)
Equivariant Deep Dynamical Model for Motion Prediction [0.0]
Deep generative modeling is a powerful approach for dynamical modeling to discover the most simplified and compressed underlying description of the data. Most learning tasks have intrinsic symmetries, i.e., the input transformations leave the output unchanged, or the output undergoes a similar transformation. We propose an SO(3) equivariant deep dynamical model (EqDDM) for motion prediction that learns a structured representation of the input space in the sense that the embedding varies with symmetry transformations.
arXiv Detail & Related papers (2021-11-02T21:01:43Z)
Monocular Human Pose and Shape Reconstruction using Part Differentiable Rendering [53.16864661460889]
Recent works succeed in regression-based methods which estimate parametric models directly through a deep neural network supervised by 3D ground truth. In this paper, we introduce body segmentation as critical supervision. To improve the reconstruction with part segmentation, we propose a part-level differentiable part that enables part-based models to be supervised by part segmentation.
arXiv Detail & Related papers (2020-03-24T14:25:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.