Related papers: Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training

Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training

URL: http://arxiv.org/abs/2411.07066v1
Date: Mon, 11 Nov 2024 15:30:16 GMT
Title: Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training
Authors: Elia Cunegatti, Leonardo Lucio Custode, Giovanni Iacca,
Abstract summary: We exploit functional information from dense pre-trained models to obtain sparse models that maximize the activations' alignment w.r.t. We propose textscNeuroAl, a emphtop-up algorithm that modifies the block-wise and row-wise sparsity ratios to maximize the emphneuron alignment among activations. We test our method on 4 different LLM families and 3 different sparsity ratios, showing how it consistently outperforms the latest state-of-the-art techniques.
Score: 3.195234044113248
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Network pruning is a set of computational techniques that aim to reduce a given model's computational cost by removing a subset of its parameters while having minimal impact on performance. Throughout the last decade, the most widely used pruning paradigm has focused on pruning and re-training, which nowadays is inconvenient due to the vast amount of pre-trained models, which are in any case too expensive to re-train. In this paper, we exploit functional information from dense pre-trained models, i.e., their activations, to obtain sparse models that maximize the activations' alignment w.r.t. their corresponding dense models. Hence, we propose \textsc{NeuroAl}, a \emph{top-up} algorithm that can be used on top of any given pruning algorithm for LLMs, that modifies the block-wise and row-wise sparsity ratios to maximize the \emph{neuron alignment} among activations. Moreover, differently from existing methods, our approach adaptively selects the best parameters for the block-wise and row-wise sparsity ratios w.r.t. to the model and the desired sparsity (given as input), and requires \emph{no re-training}. We test our method on 4 different LLM families and 3 different sparsity ratios, showing how it consistently outperforms the latest state-of-the-art techniques. The code is available at https://github.com/eliacunegatti/NeuroAL.

Related papers

GPTailor: Large Language Model Pruning Through Layer Cutting and Stitching [41.96482857947199]
Large language models (LLMs) have shown remarkable capabilities in language understanding and generation.<n>LLMs typically come with a substantial model size, which presents significant challenges in deployment and inference.<n>We develop a novel strategy to compress models by strategically combining or merging layers from finetuned model variants.
arXiv Detail & Related papers (2025-06-25T14:24:59Z)
Train with Perturbation, Infer after Merging: A Two-Stage Framework for Continual Learning [59.6658995479243]
We propose texttext-Perturb-and-Merge (P&M), a novel continual learning framework that integrates model merging into the CL paradigm to avoid forgetting.<n>Through theoretical analysis, we minimize the total loss increase across all tasks and derive an analytical solution for the optimal merging coefficient.<n>Our proposed approach achieves state-of-the-art performance on several continual learning benchmark datasets.
arXiv Detail & Related papers (2025-05-28T14:14:19Z)
LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive. Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones. We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z)
Training Deep Learning Models with Norm-Constrained LMOs [56.00317694850397]
We propose a new family of algorithms that uses the linear minimization oracle (LMO) to adapt to the geometry of the problem.<n>We demonstrate significant speedups on nanoGPT training using our algorithm, Scion, without any reliance on Adam.
arXiv Detail & Related papers (2025-02-11T13:10:34Z)
Instruction-Following Pruning for Large Language Models [58.329978053711024]
We move beyond the traditional static pruning approach of determining a fixed pruning mask for a model. In our method, the pruning mask is input-dependent and adapts dynamically based on the information described in a user instruction. Our approach, termed "instruction-following pruning", introduces a sparse mask predictor that takes the user instruction as input and dynamically selects the most relevant model parameters for the given task.
arXiv Detail & Related papers (2025-01-03T20:19:14Z)
Accelerated zero-order SGD under high-order smoothness and overparameterized regime [79.85163929026146]
We present a novel gradient-free algorithm to solve convex optimization problems. Such problems are encountered in medicine, physics, and machine learning. We provide convergence guarantees for the proposed algorithm under both types of noise.
arXiv Detail & Related papers (2024-11-21T10:26:17Z)
LoRA Unlearns More and Retains More (Student Abstract) [0.0]
PruneLoRA reduces the need for large-scale parameter updates by applying low-rank updates to the model. We leverage LoRA to selectively modify a subset of the pruned model's parameters, thereby reducing the computational cost, memory requirements and improving the model's ability to retain performance on the remaining classes.
arXiv Detail & Related papers (2024-11-16T16:47:57Z)
Harmony in Diversity: Merging Neural Networks with Canonical Correlation Analysis [17.989809995141044]
We propose CCA Merge, which is based on Corre Analysis Analysis. We show that CCA works significantly better than past methods when more than 2 models are merged.
arXiv Detail & Related papers (2024-07-07T14:21:04Z)
Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient [57.9629676017527]
We propose an optimization-based structural pruning on Large-Language Models. We learn the pruning masks in a probabilistic space directly by optimizing the loss of the pruned model. Our method operates for 2.7 hours with around 35GB memory for the 13B models on a single A100 GPU.
arXiv Detail & Related papers (2024-06-15T09:31:03Z)
A Unified Framework for Soft Threshold Pruning [27.853698217792456]
We reformulate soft threshold pruning as an implicit optimization problem solved using the Iterative Shrinkage-Thresholding Algorithm (ISTA) We derive an optimal threshold scheduler through an in-depth study of threshold scheduling based on our framework. In principle, the derived pruning algorithm could sparsify any mathematical model trained via SGD.
arXiv Detail & Related papers (2023-02-25T08:16:14Z)
Oracle Inequalities for Model Selection in Offline Reinforcement Learning [105.74139523696284]
We study the problem of model selection in offline RL with value function approximation. We propose the first model selection algorithm for offline RL that achieves minimax rate-optimal inequalities up to logarithmic factors. We conclude with several numerical simulations showing it is capable of reliably selecting a good model class.
arXiv Detail & Related papers (2022-11-03T17:32:34Z)
Implicit Parameter-free Online Learning with Truncated Linear Models [51.71216912089413]
parameter-free algorithms are online learning algorithms that do not require setting learning rates. We propose new parameter-free algorithms that can take advantage of truncated linear models through a new update that has an "implicit" flavor. Based on a novel decomposition of the regret, the new update is efficient, requires only one gradient at each step, never overshoots the minimum of the truncated model, and retains the favorable parameter-free properties.
arXiv Detail & Related papers (2022-03-19T13:39:49Z)
Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements. We derive an information-theoretic lower bound for the minimax risk under this setting. We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z)
Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models. Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.