Related papers: Efficient Model Compression Techniques with FishLeg

Efficient Model Compression Techniques with FishLeg

URL: http://arxiv.org/abs/2412.02328v1
Date: Tue, 03 Dec 2024 09:42:16 GMT
Title: Efficient Model Compression Techniques with FishLeg
Authors: Jamie McGowan, Wei Sheng Lai, Weibin Chen, Henry Aldridge, Jools Clarke, Jezabel Garcia, Rui Xia, Yilei Liang, Guillaume Hennequin, Alberto Bernacchia,
Abstract summary: FishLeg is a new second-order pruning method based on the Fisher-Legendre (FishLeg)<n>At the heart of FishLeg is a meta-learning approach to amortising the action of the inverse FIM.<n>We find that FishLeg achieves higher or comparable performance against two common baselines in the area.
Score: 30.69238973086908
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In many domains, the most successful AI models tend to be the largest, indeed often too large to be handled by AI players with limited computational resources. To mitigate this, a number of compression methods have been developed, including methods that prune the network down to high sparsity whilst retaining performance. The best-performing pruning techniques are often those that use second-order curvature information (such as an estimate of the Fisher information matrix) to score the importance of each weight and to predict the optimal compensation for weight deletion. However, these methods are difficult to scale to high-dimensional parameter spaces without making heavy approximations. Here, we propose the FishLeg surgeon (FLS), a new second-order pruning method based on the Fisher-Legendre (FishLeg) optimizer. At the heart of FishLeg is a meta-learning approach to amortising the action of the inverse FIM, which brings a number of advantages. Firstly, the parameterisation enables the use of flexible tensor factorisation techniques to improve computational and memory efficiency without sacrificing much accuracy, alleviating challenges associated with scalability of most second-order pruning methods. Secondly, directly estimating the inverse FIM leads to less sensitivity to the amplification of stochasticity during inversion, thereby resulting in more precise estimates. Thirdly, our approach also allows for progressive assimilation of the curvature into the parameterisation. In the gradual pruning regime, this results in a more efficient estimate refinement as opposed to re-estimation. We find that FishLeg achieves higher or comparable performance against two common baselines in the area, most notably in the high sparsity regime when considering a ResNet18 model on CIFAR-10 (84% accuracy at 95% sparsity vs 60% for OBS) and TinyIM (53% accuracy at 80% sparsity vs 48% for OBS).

Related papers

Hyperparameter Transfer Enables Consistent Gains of Matrix-Preconditioned Optimizers Across Scales [55.91454326946738]
We study how the optimal learning rate and weight decay should scale with model width and depth for a wide range of languages.<n>We find that scaling the learning rate according to $$P improves transfer, but can still suffer from significant finite-width deviations.<n>For compute-optimal scaling, we find scaling independent weight decay as $1/mathrmwidth$ is nearly optimal across languages.
arXiv Detail & Related papers (2025-12-05T11:03:41Z)
EDFFDNet: Towards Accurate and Efficient Unsupervised Multi-Grid Image Registration [17.190325630307097]
We propose an Exponential-Decay Free-Form Deformation Network (EDFFDNet), which employs free-form deformation with an exponential-decay basis function.<n>By transforming dense interactions into sparse ones, ASMA reduces parameters and improves accuracy.<n>Experiments demonstrate that EDFFDNet reduces parameters, memory, and total runtime by 70.5%, 32.6%, and 33.7%, respectively.<n>EDFFDNet-2 further improves PSNR by 1.06 dB while maintaining lower computational costs.
arXiv Detail & Related papers (2025-09-09T12:30:51Z)
Generalized Fisher-Weighted SVD: Scalable Kronecker-Factored Fisher Approximation for Compressing Large Language Models [6.57101653042078]
Generalized Fisher-Weighted SVD (GFWSVD) is a post-training compression technique that accounts for both diagonal and off-diagonal elements of the Fisher information matrix.<n>We demonstrate the effectiveness of our method on LLM compression, showing improvements over existing compression baselines.
arXiv Detail & Related papers (2025-05-23T14:41:52Z)
Towards Practical Second-Order Optimizers in Deep Learning: Insights from Fisher Information Analysis [0.0]
We present AdaFisher, a novel adaptive second-order tuning for deep neural networks (DNNs) AdaFisher aims to bridge the gap between the improved convergence and generalization of second-order methods and the computational efficiency needed for trainings. We demonstrate that AdaFisher outperforms state-of-the-art approximations in both accuracy and convergence speed.
arXiv Detail & Related papers (2025-04-26T05:02:21Z)
Column-wise Quantization of Weights and Partial Sums for Accurate and Efficient Compute-In-Memory Accelerators [7.728820930581886]
CIM is an efficient method for implementing deep neural networks (DNNs) but suffers from substantial overhead. Low-precision ADCs can reduce this overhead but introduce partial-sum quantization errors degrading accuracy. This work addresses these challenges by aligning weight and partial-sum quantization granularities at the column-wise level.
arXiv Detail & Related papers (2025-02-11T05:32:14Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts. Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
Efficient Second-Order Neural Network Optimization via Adaptive Trust Region Methods [0.0]
SecondOrderAdaptive (SOAA) is a novel optimization algorithm designed to overcome limitations of traditional second-order techniques. We empirically demonstrate that SOAA achieves faster and more stable convergence compared to first-order approximations.
arXiv Detail & Related papers (2024-10-03T08:23:06Z)
NEAT: Nonlinear Parameter-efficient Adaptation of Pre-trained Models [26.808251361020066]
Fine-tuning pre-trained models is resource-intensive and laborious. One widely adopted PEFT technique, Low-Rank Adaptation (LoRA), freezes the pre-trained model weights. NEAT introduces a lightweight neural network that takes pre-trained weights as input and learns a nonlinear transformation to approximate cumulative weight updates.
arXiv Detail & Related papers (2024-10-02T17:29:23Z)
LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning [56.88751562302793]
Low-rank adaption (LoRA) has emerged to fine-tune large language models (LLMs) LoRAPrune is a new framework that delivers an accurate structured pruned model in a highly memory-efficient manner. LoRAPrune achieves a reduction in perplexity by 4.81 on WikiText2 and 3.46 on PTB, while also decreasing memory usage by 52.6%.
arXiv Detail & Related papers (2023-05-28T15:15:48Z)
Fast as CHITA: Neural Network Pruning with Combinatorial Optimization [9.440450886684603]
We propose a novel optimization-based pruning framework that considers the combined effect of pruning (and updating) multiple weights subject to a sparsity constraint. Our approach, CHITA, extends the classical Brain Surgeon framework and results in significant improvements in speed, memory, and performance.
arXiv Detail & Related papers (2023-02-28T15:03:18Z)
Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples. We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment. We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z)
Exact Backpropagation in Binary Weighted Networks with Group Weight Transformations [0.0]
Quantization based model compression serves as high performing and fast approach for inference. Models that constrain the weights to binary values enable efficient implementation of the ubiquitous dot product.
arXiv Detail & Related papers (2021-07-03T10:29:34Z)
Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models. Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z)
Non-Parametric Adaptive Network Pruning [125.4414216272874]
We introduce non-parametric modeling to simplify the algorithm design. Inspired by the face recognition community, we use a message passing algorithm to obtain an adaptive number of exemplars. EPruner breaks the dependency on the training data in determining the "important" filters.
arXiv Detail & Related papers (2021-01-20T06:18:38Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning [91.13797346047984]
We introduce ADAHESSIAN, a second order optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates. We show that ADAHESSIAN achieves new state-of-the-art results by a large margin as compared to other adaptive optimization methods.
arXiv Detail & Related papers (2020-06-01T05:00:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.