Sharp asymptotics on the compression of two-layer neural networks
- URL: http://arxiv.org/abs/2205.08199v2
- Date: Wed, 18 May 2022 08:57:56 GMT
- Title: Sharp asymptotics on the compression of two-layer neural networks
- Authors: Mohammad Hossein Amani, Simone Bombari, Marco Mondelli, Rattana
Pukdee, Stefano Rini
- Abstract summary: We study the compression of a target two-layer neural network with N nodes into a compressed network with M N nodes.
We conjecture that the optimum simplified optimization problem is achieved by taking weights on the Equi Tight Frame (ETF)
- Score: 19.683271092724937
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study the compression of a target two-layer neural network
with N nodes into a compressed network with M < N nodes. More precisely, we
consider the setting in which the weights of the target network are i.i.d.
sub-Gaussian, and we minimize the population L2 loss between the outputs of the
target and of the compressed network, under the assumption of Gaussian inputs.
By using tools from high-dimensional probability, we show that this non-convex
problem can be simplified when the target network is sufficiently
over-parameterized, and provide the error rate of this approximation as a
function of the input dimension and N . For a ReLU activation function, we
conjecture that the optimum of the simplified optimization problem is achieved
by taking weights on the Equiangular Tight Frame (ETF), while the scaling of
the weights and the orientation of the ETF depend on the parameters of the
target network. Numerical evidence is provided to support this conjecture.
Related papers
- Fixing the NTK: From Neural Network Linearizations to Exact Convex
Programs [63.768739279562105]
We show that for a particular choice of mask weights that do not depend on the learning targets, this kernel is equivalent to the NTK of the gated ReLU network on the training data.
A consequence of this lack of dependence on the targets is that the NTK cannot perform better than the optimal MKL kernel on the training set.
arXiv Detail & Related papers (2023-09-26T17:42:52Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - Spike-and-slab shrinkage priors for structurally sparse Bayesian neural networks [0.16385815610837165]
Sparse deep learning addresses challenges by recovering a sparse representation of the underlying target function.
Deep neural architectures compressed via structured sparsity provide low latency inference, higher data throughput, and reduced energy consumption.
We propose structurally sparse Bayesian neural networks which prune excessive nodes with (i) Spike-and-Slab Group Lasso (SS-GL), and (ii) Spike-and-Slab Group Horseshoe (SS-GHS) priors.
arXiv Detail & Related papers (2023-08-17T17:14:18Z) - Binarizing Sparse Convolutional Networks for Efficient Point Cloud
Analysis [93.55896765176414]
We propose binary sparse convolutional networks called BSC-Net for efficient point cloud analysis.
We employ the differentiable search strategies to discover the optimal opsitions for active site matching in the shifted sparse convolution.
Our BSC-Net achieves significant improvement upon our srtong baseline and outperforms the state-of-the-art network binarization methods.
arXiv Detail & Related papers (2023-03-27T13:47:06Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK)
In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z) - Sparse Uncertainty Representation in Deep Learning with Inducing Weights [22.912675044223302]
We extend Matheron's conditional Gaussian sampling rule to enable fast weight sampling, which enables our inference method to maintain reasonable run-time as compared with ensembles.
Our approach achieves competitive performance to the state-of-the-art in prediction and uncertainty estimation tasks with fully connected neural networks and ResNets.
arXiv Detail & Related papers (2021-05-30T18:17:47Z) - A Probabilistic Approach to Neural Network Pruning [20.001091112545065]
We theoretically study the performance of two pruning techniques (random and magnitude-based) on FCNs and CNNs.
The results establish that there exist pruned networks with expressive power within any specified bound from the target network.
arXiv Detail & Related papers (2021-05-20T23:19:43Z) - Stable Recovery of Entangled Weights: Towards Robust Identification of
Deep Neural Networks from Minimal Samples [0.0]
We introduce the so-called entangled weights, which compose weights of successive layers intertwined with suitable diagonal and invertible matrices depending on the activation functions and their shifts.
We prove that entangled weights are completely and stably approximated by an efficient and robust algorithm.
In terms of practical impact, our study shows that we can relate input-output information uniquely and stably to network parameters, providing a form of explainability.
arXiv Detail & Related papers (2021-01-18T16:31:19Z) - On the Principle of Least Symmetry Breaking in Shallow ReLU Models [13.760721677322072]
We show that the emphleast loss of symmetry with respect to the target weights may apply to a broader range of settings.
Motivated by this, we conduct a series of experiments which corroborate this hypothesis for different classes of non-isotropic non-product distributions, smooth activation functions and networks with a few layers.
arXiv Detail & Related papers (2019-12-26T22:04:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.