Related papers: Fishing For Cheap And Efficient Pruners At Initialization

Fishing For Cheap And Efficient Pruners At Initialization

URL: http://arxiv.org/abs/2502.11450v1
Date: Mon, 17 Feb 2025 05:22:23 GMT
Title: Fishing For Cheap And Efficient Pruners At Initialization
Authors: Ivo Gollini Navarrete, Nicolas Mauricio Cuadrado, Jose Renato Restom, Martin Takáč, Samuel Horváth,
Abstract summary: Pruning offers a promising solution to mitigate the associated costs and environmental impact of deploying large deep neural networks (DNNs)<n>We introduce Fisher-Taylor Sensitivity (FTS), a computationally cheap and efficient pruning criterion based on the empirical Fisher Information Matrix (FIM) diagonal.<n>Our method achieves competitive performance against state-of-the-art techniques for one-shot PBT, even under extreme sparsity conditions.
Score: 4.433137726540548
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Pruning offers a promising solution to mitigate the associated costs and environmental impact of deploying large deep neural networks (DNNs). Traditional approaches rely on computationally expensive trained models or time-consuming iterative prune-retrain cycles, undermining their utility in resource-constrained settings. To address this issue, we build upon the established principles of saliency (LeCun et al., 1989) and connection sensitivity (Lee et al., 2018) to tackle the challenging problem of one-shot pruning neural networks (NNs) before training (PBT) at initialization. We introduce Fisher-Taylor Sensitivity (FTS), a computationally cheap and efficient pruning criterion based on the empirical Fisher Information Matrix (FIM) diagonal, offering a viable alternative for integrating first- and second-order information to identify a model's structurally important parameters. Although the FIM-Hessian equivalency only holds for convergent models that maximize the likelihood, recent studies (Karakida et al., 2019) suggest that, even at initialization, the FIM captures essential geometric information of parameters in overparameterized NNs, providing the basis for our method. Finally, we demonstrate empirically that layer collapse, a critical limitation of data-dependent pruning methodologies, is easily overcome by pruning within a single training epoch after initialization. We perform experiments on ResNet18 and VGG19 with CIFAR-10 and CIFAR-100, widely used benchmarks in pruning research. Our method achieves competitive performance against state-of-the-art techniques for one-shot PBT, even under extreme sparsity conditions. Our code is made available to the public.

Related papers

CBPNet: A Continual Backpropagation Prompt Network for Alleviating Plasticity Loss on Edge Devices [16.318540474216416]
We argue that the reduction in plasticity stems from a lack of update vitality in underutilized parameters during the training process.<n>We propose the Continual Backpropagation Prompt Network (CBPNet), an effective and parameter efficient framework designed to restore the model's learning vitality.
arXiv Detail & Related papers (2025-09-19T09:16:54Z)
EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning [53.88000987041739]
Class-Incremental Learning (CIL) aims to enable AI models to continuously learn from sequentially arriving data of different classes over time.<n>We propose the Elastic Knowledge Preservation and Compensation (EKPC) method, integrating Importance-aware importance Regularization (IPR) and Trainable Semantic Drift Compensation (TSDC) for CIL.
arXiv Detail & Related papers (2025-06-14T05:19:58Z)
An Efficient Sparse Fine-Tuning with Low Quantization Error via Neural Network Pruning [9.208007322096535]
We develop a new SpFT framework, based on ideas from neural network pruning.<n>We show our method improves SpFT's memory efficiency by 20-50% while matching the accuracy of state-of-the-art methods like LoRA's variants.
arXiv Detail & Related papers (2025-02-17T04:54:42Z)
Concurrent Training and Layer Pruning of Deep Neural Networks [0.0]
We propose an algorithm capable of identifying and eliminating irrelevant layers of a neural network during the early stages of training. We employ a structure using residual connections around nonlinear network sections that allow the flow of information through the network once a nonlinear section is pruned.
arXiv Detail & Related papers (2024-06-06T23:19:57Z)
Towards Continual Learning Desiderata via HSIC-Bottleneck Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion. Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z)
Fast-NTK: Parameter-Efficient Unlearning for Large-Scale Models [17.34908967455907]
machine unlearning'' proposes the selective removal of unwanted data without the need for retraining from scratch. Fast-NTK is a novel NTK-based unlearning algorithm that significantly reduces the computational complexity.
arXiv Detail & Related papers (2023-12-22T18:55:45Z)
Adversarial Collaborative Filtering for Free [27.949683060138064]
Collaborative Filtering (CF) has been successfully used to help users discover the items of interest. Existing methods suffer from noisy data issue, which negatively impacts the quality of recommendation. We present Sharpness-aware Collaborative Filtering (CF), a simple yet effective method that conducts adversarial training without extra computational cost over the base.
arXiv Detail & Related papers (2023-08-20T19:25:38Z)
Can we achieve robustness from data alone? [0.7366405857677227]
Adversarial training and its variants have come to be the prevailing methods to achieve adversarially robust classification using neural networks. We devise a meta-learning method for robust classification, that optimize the dataset prior to its deployment in a principled way. Experiments on MNIST and CIFAR-10 demonstrate that the datasets we produce enjoy very high robustness against PGD attacks.
arXiv Detail & Related papers (2022-07-24T12:14:48Z)
A Simple Fine-tuning Is All You Need: Towards Robust Deep Learning Via Adversarial Fine-tuning [90.44219200633286]
We propose a simple yet very effective adversarial fine-tuning approach based on a $textitslow start, fast decay$ learning rate scheduling strategy. Experimental results show that the proposed adversarial fine-tuning approach outperforms the state-of-the-art methods on CIFAR-10, CIFAR-100 and ImageNet datasets.
arXiv Detail & Related papers (2020-12-25T20:50:15Z)
Continual Learning in Recurrent Neural Networks [67.05499844830231]
We evaluate the effectiveness of continual learning methods for processing sequential data with recurrent neural networks (RNNs) We shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs. We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements.
arXiv Detail & Related papers (2020-06-22T10:05:12Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
Robust Pruning at Initialization [61.30574156442608]
A growing need for smaller, energy-efficient, neural networks to be able to use machine learning applications on devices with limited computational resources. For Deep NNs, such procedures remain unsatisfactory as the resulting pruned networks can be difficult to train and, for instance, they do not prevent one layer from being fully pruned.
arXiv Detail & Related papers (2020-02-19T17:09:50Z)
MSE-Optimal Neural Network Initialization via Layer Fusion [68.72356718879428]
Deep neural networks achieve state-of-the-art performance for a range of classification and inference tasks. The use of gradient combined nonvolutionity renders learning susceptible to novel problems. We propose fusing neighboring layers of deeper networks that are trained with random variables.
arXiv Detail & Related papers (2020-01-28T18:25:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.