OrthoReg: Robust Network Pruning Using Orthonormality Regularization
- URL: http://arxiv.org/abs/2009.05014v1
- Date: Thu, 10 Sep 2020 17:21:21 GMT
- Title: OrthoReg: Robust Network Pruning Using Orthonormality Regularization
- Authors: Ekdeep Singh Lubana, Puja Trivedi, Conrad Hougen, Robert P. Dick,
Alfred O. Hero
- Abstract summary: We propose a principled regularization strategy that enforces orthonormality on a network's filters to reduce inter-filter correlation.
When used for iterative pruning on VGG-13, MobileNet-V1, and ResNet-34, OrthoReg consistently outperforms five baseline techniques.
- Score: 7.754712828900727
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Network pruning in Convolutional Neural Networks (CNNs) has been extensively
investigated in recent years. To determine the impact of pruning a group of
filters on a network's accuracy, state-of-the-art pruning methods consistently
assume filters of a CNN are independent. This allows the importance of a group
of filters to be estimated as the sum of importances of individual filters.
However, overparameterization in modern networks results in highly correlated
filters that invalidate this assumption, thereby resulting in incorrect
importance estimates. To address this issue, we propose OrthoReg, a principled
regularization strategy that enforces orthonormality on a network's filters to
reduce inter-filter correlation, thereby allowing reliable, efficient
determination of group importance estimates, improved trainability of pruned
networks, and efficient, simultaneous pruning of large groups of filters. When
used for iterative pruning on VGG-13, MobileNet-V1, and ResNet-34, OrthoReg
consistently outperforms five baseline techniques, including the
state-of-the-art, on CIFAR-100 and Tiny-ImageNet. For the recently proposed
Early-Bird Ticket hypothesis, which claims networks become amenable to pruning
early-on in training and can be pruned after a few epochs to minimize training
expenditure, we find OrthoReg significantly outperforms prior work. Code
available at https://github.com/EkdeepSLubana/OrthoReg.
Related papers
- Structured Network Pruning by Measuring Filter-wise Interactions [6.037167142826297]
We propose a structured network pruning approach SNPFI (Structured Network Pruning by measuring Filter-wise Interaction)
During the pruning, the SNPFI can automatically assign the proper sparsity based on the filter utilization strength.
We empirically demonstrate the effectiveness of the SNPFI with several commonly used CNN models.
arXiv Detail & Related papers (2023-07-03T05:26:05Z) - Trainability Preserving Neural Structured Pruning [64.65659982877891]
We present trainability preserving pruning (TPP), a regularization-based structured pruning method that can effectively maintain trainability during sparsification.
TPP can compete with the ground-truth dynamical isometry recovery method on linear networks.
It delivers encouraging performance in comparison to many top-performing filter pruning methods.
arXiv Detail & Related papers (2022-07-25T21:15:47Z) - Interspace Pruning: Using Adaptive Filter Representations to Improve
Training of Sparse CNNs [69.3939291118954]
Unstructured pruning is well suited to reduce the memory footprint of convolutional neural networks (CNNs)
Standard unstructured pruning (SP) reduces the memory footprint of CNNs by setting filter elements to zero.
We introduce interspace pruning (IP), a general tool to improve existing pruning methods.
arXiv Detail & Related papers (2022-03-15T11:50:45Z) - Batch Normalization Tells You Which Filter is Important [49.903610684578716]
We propose a simple yet effective filter pruning method by evaluating the importance of each filter based on the BN parameters of pre-trained CNNs.
The experimental results on CIFAR-10 and ImageNet demonstrate that the proposed method can achieve outstanding performance.
arXiv Detail & Related papers (2021-12-02T12:04:59Z) - Filter Pruning using Hierarchical Group Sparse Regularization for Deep
Convolutional Neural Networks [3.5636461829966093]
We propose a filter pruning method using the hierarchical group sparse regularization.
It can reduce more than 50% parameters of ResNet for CIFAR-10 with only 0.3% decrease in the accuracy of test samples.
Also, 34% parameters of ResNet are reduced for TinyImageNet-200 with higher accuracy than the baseline network.
arXiv Detail & Related papers (2020-11-04T16:29:41Z) - Data Agnostic Filter Gating for Efficient Deep Networks [72.4615632234314]
Current filter pruning methods mainly leverage feature maps to generate important scores for filters and prune those with smaller scores.
In this paper, we propose a data filter pruning method that uses an auxiliary network named Dagger module to induce pruning.
In addition, to help prune filters with certain FLOPs constraints, we leverage an explicit FLOPs-aware regularization to directly promote pruning filters toward target FLOPs.
arXiv Detail & Related papers (2020-10-28T15:26:40Z) - Dependency Aware Filter Pruning [74.69495455411987]
Pruning a proportion of unimportant filters is an efficient way to mitigate the inference cost.
Previous work prunes filters according to their weight norms or the corresponding batch-norm scaling factors.
We propose a novel mechanism to dynamically control the sparsity-inducing regularization so as to achieve the desired sparsity.
arXiv Detail & Related papers (2020-05-06T07:41:22Z) - Filter Grafting for Deep Neural Networks: Reason, Method, and
Cultivation [86.91324735966766]
Filter is the key component in modern convolutional neural networks (CNNs)
In this paper, we introduce filter grafting (textbfMethod) to achieve this goal.
We develop a novel criterion to measure the information of filters and an adaptive weighting strategy to balance the grafted information among networks.
arXiv Detail & Related papers (2020-04-26T08:36:26Z) - Pruning CNN's with linear filter ensembles [0.0]
We use pruning to reduce the network size and -- implicitly -- the number of floating point operations (FLOPs)
We develop a novel filter importance norm that is based on the change in the empirical loss caused by the presence or removal of a component from the network architecture.
We evaluate our method on a fully connected network, as well as on the ResNet architecture trained on the CIFAR-10 dataset.
arXiv Detail & Related papers (2020-01-22T16:52:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.