Exploiting Non-Linear Redundancy for Neural Model Compression
- URL: http://arxiv.org/abs/2005.14070v1
- Date: Thu, 28 May 2020 15:13:21 GMT
- Title: Exploiting Non-Linear Redundancy for Neural Model Compression
- Authors: Muhammad A. Shah, Raphael Olivier and Bhiksha Raj
- Abstract summary: We propose a novel model compression approach based on exploitation of linear dependence.
Our method results in a reduction of up to 99% in overall network size with small loss in performance.
- Score: 26.211513643079993
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deploying deep learning models, comprising of non-linear combination of
millions, even billions, of parameters is challenging given the memory, power
and compute constraints of the real world. This situation has led to research
into model compression techniques most of which rely on suboptimal heuristics
and do not consider the parameter redundancies due to linear dependence between
neuron activations in overparametrized networks. In this paper, we propose a
novel model compression approach based on exploitation of linear dependence,
that compresses networks by elimination of entire neurons and redistribution of
their activations over other neurons in a manner that is provably lossless
while training. We combine this approach with an annealing algorithm that may
be applied during training, or even on a trained model, and demonstrate, using
popular datasets, that our method results in a reduction of up to 99\% in
overall network size with small loss in performance. Furthermore, we provide
theoretical results showing that in overparametrized, locally linear (ReLU)
neural networks where redundant features exist, and with correct hyperparameter
selection, our method is indeed able to capture and suppress those
dependencies.
Related papers
- Heterogenous Memory Augmented Neural Networks [84.29338268789684]
We introduce a novel heterogeneous memory augmentation approach for neural networks.
By introducing learnable memory tokens with attention mechanism, we can effectively boost performance without huge computational overhead.
We show our approach on various image and graph-based tasks under both in-distribution (ID) and out-of-distribution (OOD) conditions.
arXiv Detail & Related papers (2023-10-17T01:05:28Z) - Split-Boost Neural Networks [1.1549572298362787]
We propose an innovative training strategy for feed-forward architectures - called split-boost.
Such a novel approach ultimately allows us to avoid explicitly modeling the regularization term.
The proposed strategy is tested on a real-world (anonymized) dataset within a benchmark medical insurance design problem.
arXiv Detail & Related papers (2023-09-06T17:08:57Z) - Towards a Better Theoretical Understanding of Independent Subnetwork Training [56.24689348875711]
We take a closer theoretical look at Independent Subnetwork Training (IST)
IST is a recently proposed and highly effective technique for solving the aforementioned problems.
We identify fundamental differences between IST and alternative approaches, such as distributed methods with compressed communication.
arXiv Detail & Related papers (2023-06-28T18:14:22Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Quiver neural networks [5.076419064097734]
We develop a uniform theoretical approach towards the analysis of various neural network connectivity architectures.
Inspired by quiver representation theory in mathematics, this approach gives a compact way to capture elaborate data flows.
arXiv Detail & Related papers (2022-07-26T09:42:45Z) - Non-linear manifold ROM with Convolutional Autoencoders and Reduced
Over-Collocation method [0.0]
Non-affine parametric dependencies, nonlinearities and advection-dominated regimes of the model of interest can result in a slow Kolmogorov n-width decay.
We implement the non-linear manifold method introduced by Carlberg et al [37] with hyper-reduction achieved through reduced over-collocation and teacher-student training of a reduced decoder.
We test the methodology on a 2d non-linear conservation law and a 2d shallow water models, and compare the results obtained with a purely data-driven method for which the dynamics is evolved in time with a long-short term memory network
arXiv Detail & Related papers (2022-03-01T11:16:50Z) - Neuron-based Pruning of Deep Neural Networks with Better Generalization
using Kronecker Factored Curvature Approximation [18.224344440110862]
The proposed algorithm directs the parameters of the compressed model toward a flatter solution by exploring the spectral radius of Hessian.
Our result shows that it improves the state-of-the-art results on neuron compression.
The method is able to achieve very small networks with small accuracy across different neural network models.
arXiv Detail & Related papers (2021-11-16T15:55:59Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Network Diffusions via Neural Mean-Field Dynamics [52.091487866968286]
We propose a novel learning framework for inference and estimation problems of diffusion on networks.
Our framework is derived from the Mori-Zwanzig formalism to obtain an exact evolution of the node infection probabilities.
Our approach is versatile and robust to variations of the underlying diffusion network models.
arXiv Detail & Related papers (2020-06-16T18:45:20Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z) - Residual Continual Learning [33.442903467864966]
We propose a novel continual learning method called Residual Continual Learning (ResCL)
Our method can prevent the catastrophic forgetting phenomenon in sequential learning of multiple tasks, without any source task information except the original network.
The proposed method exhibits state-of-the-art performance in various continual learning scenarios.
arXiv Detail & Related papers (2020-02-17T05:24:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.