Related papers: Learning a Consensus Sub-Network with Polarization Regularization and One Pass Training

Learning a Consensus Sub-Network with Polarization Regularization and One Pass Training

URL: http://arxiv.org/abs/2302.10798v4
Date: Sat, 4 Nov 2023 18:55:30 GMT
Title: Learning a Consensus Sub-Network with Polarization Regularization and One Pass Training
Authors: Xiaoying Zhi, Varun Babbar, Pheobe Sun, Fran Silavong, Ruibo Shi, Sean Moran
Abstract summary: Pruning schemes create extra overhead either by iterative training and fine-tuning for static pruning or repeated computation of a dynamic pruning graph. We propose a new parameter pruning strategy for learning a lighter-weight sub-network that minimizes the energy cost while maintaining comparable performance to the fully parameterised network on given downstream tasks. Our results on CIFAR-10 and CIFAR-100 suggest that our scheme can remove 50% of connections in deep networks with less than 1% reduction in classification accuracy.
Score: 3.2214522506924093
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: The subject of green AI has been gaining attention within the deep learning community given the recent trend of ever larger and more complex neural network models. Existing solutions for reducing the computational load of training at inference time usually involve pruning the network parameters. Pruning schemes often create extra overhead either by iterative training and fine-tuning for static pruning or repeated computation of a dynamic pruning graph. We propose a new parameter pruning strategy for learning a lighter-weight sub-network that minimizes the energy cost while maintaining comparable performance to the fully parameterised network on given downstream tasks. Our proposed pruning scheme is green-oriented, as it only requires a one-off training to discover the optimal static sub-networks by dynamic pruning methods. The pruning scheme consists of a binary gating module and a novel loss function to uncover sub-networks with user-defined sparsity. Our method enables pruning and training simultaneously, which saves energy in both the training and inference phases and avoids extra computational overhead from gating modules at inference time. Our results on CIFAR-10 and CIFAR-100 suggest that our scheme can remove 50% of connections in deep networks with less than 1% reduction in classification accuracy. Compared to other related pruning methods, our method demonstrates a lower drop in accuracy for equivalent reductions in computational cost.

Related papers

Complexity-Aware Training of Deep Neural Networks for Optimal Structure Discovery [0.0]
We propose a novel algorithm for combined unit/filter and layer pruning of deep neural networks that functions during training and without requiring a pre-trained network to apply. Our algorithm optimally trades-off learning accuracy and pruning levels while balancing layer vs. unit/filter pruning and computational vs. parameter complexity using only three user-defined parameters.
arXiv Detail & Related papers (2024-11-14T02:00:22Z)
Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs. We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
Slimmable Networks for Contrastive Self-supervised Learning [69.9454691873866]
Self-supervised learning makes significant progress in pre-training large models, but struggles with small models. We introduce another one-stage solution to obtain pre-trained small models without the need for extra teachers. A slimmable network consists of a full network and several weight-sharing sub-networks, which can be pre-trained once to obtain various networks.
arXiv Detail & Related papers (2022-09-30T15:15:05Z)
Learning in Feedback-driven Recurrent Spiking Neural Networks using full-FORCE Training [4.124948554183487]
We propose a supervised training procedure for RSNNs, where a second network is introduced only during the training. The proposed training procedure consists of generating targets for both recurrent and readout layers. We demonstrate the improved performance and noise robustness of the proposed full-FORCE training procedure to model 8 dynamical systems.
arXiv Detail & Related papers (2022-05-26T19:01:19Z)
Dimensionality Reduced Training by Pruning and Freezing Parts of a Deep Neural Network, a Survey [69.3939291118954]
State-of-the-art deep learning models have a parameter count that reaches into the billions. Training, storing and transferring such models is energy and time consuming, thus costly. Model compression lowers storage and transfer costs, and can further make training more efficient by decreasing the number of computations in the forward and/or backward pass. This work is a survey on methods which reduce the number of trained weights in deep learning models throughout the training.
arXiv Detail & Related papers (2022-05-17T05:37:08Z)
Joint inference and input optimization in equilibrium networks [68.63726855991052]
deep equilibrium model is a class of models that foregoes traditional network depth and instead computes the output of a network by finding the fixed point of a single nonlinear layer. We show that there is a natural synergy between these two settings. We demonstrate this strategy on various tasks such as training generative models while optimizing over latent codes, training models for inverse problems like denoising and inpainting, adversarial training and gradient based meta-learning.
arXiv Detail & Related papers (2021-11-25T19:59:33Z)
Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks. The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z)
Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning. We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset. We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z)
Weight Pruning via Adaptive Sparsity Loss [31.978830843036658]
Pruning neural networks has regained interest in recent years as a means to compress state-of-the-art deep neural networks. We propose a robust learning framework that efficiently prunes network parameters during training with minimal computational overhead.
arXiv Detail & Related papers (2020-06-04T10:55:16Z)
Pruning Filters while Training for Efficiently Optimizing Deep Learning Networks [6.269700080380206]
Pruning techniques have been proposed that remove less significant weights in deep networks. We propose a dynamic pruning-while-training procedure, wherein we prune filters of a deep network during training itself. Results indicate that pruning while training yields a compressed network with almost no accuracy loss after pruning 50% of the filters.
arXiv Detail & Related papers (2020-03-05T18:05:17Z)
Gradual Channel Pruning while Training using Feature Relevance Scores for Convolutional Neural Networks [6.534515590778012]
Pruning is one of the predominant approaches used for deep network compression. We present a simple-yet-effective gradual channel pruning while training methodology using a novel data-driven metric. We demonstrate the effectiveness of the proposed methodology on architectures such as VGG and ResNet.
arXiv Detail & Related papers (2020-02-23T17:56:18Z)
Activation Density driven Energy-Efficient Pruning in Training [2.222917681321253]
We propose a novel pruning method that prunes a network real-time during training. We obtain exceedingly sparse networks with accuracy comparable to the baseline network.
arXiv Detail & Related papers (2020-02-07T18:34:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.