Polynomially Over-Parameterized Convolutional Neural Networks Contain
Structured Strong Winning Lottery Tickets
- URL: http://arxiv.org/abs/2311.09858v1
- Date: Thu, 16 Nov 2023 12:38:45 GMT
- Title: Polynomially Over-Parameterized Convolutional Neural Networks Contain
Structured Strong Winning Lottery Tickets
- Authors: Arthur da Cunha, Francesco d'Amore, Emanuele Natale
- Abstract summary: We prove the existence of structured Neuralworks that can approximate any sufficiently smaller network.
This result provides the first sub-exponential bound around the Strong Lottery Ticket Hypothesis.
- Score: 4.020829863982153
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Strong Lottery Ticket Hypothesis (SLTH) states that randomly-initialised
neural networks likely contain subnetworks that perform well without any
training. Although unstructured pruning has been extensively studied in this
context, its structured counterpart, which can deliver significant
computational and memory efficiency gains, has been largely unexplored. One of
the main reasons for this gap is the limitations of the underlying mathematical
tools used in formal analyses of the SLTH. In this paper, we overcome these
limitations: we leverage recent advances in the multidimensional generalisation
of the Random Subset-Sum Problem and obtain a variant that admits the
stochastic dependencies that arise when addressing structured pruning in the
SLTH. We apply this result to prove, for a wide class of random Convolutional
Neural Networks, the existence of structured subnetworks that can approximate
any sufficiently smaller network.
This result provides the first sub-exponential bound around the SLTH for
structured pruning, opening up new avenues for further research on the
hypothesis and contributing to the understanding of the role of
over-parameterization in deep learning.
Related papers
- An Infinite-Width Analysis on the Jacobian-Regularised Training of a Neural Network [10.384951432591492]
Recent theoretical analysis of deep neural networks in their infinite-width limits has deepened our understanding of initialisation, feature learning, and training of those networks.
We show that this infinite-width analysis can be extended to the Jacobian of a deep neural network.
We experimentally show the relevance of our theoretical claims to wide finite networks, and empirically analyse the properties of kernel regression solution to obtain an insight into Jacobian regularisation.
arXiv Detail & Related papers (2023-12-06T09:52:18Z) - DANI: Fast Diffusion Aware Network Inference with Preserving Topological
Structure Property [2.8948274245812327]
We propose a novel method called DANI to infer the underlying network while preserving its structural properties.
DANI has higher accuracy and lower run time while maintaining structural properties, including modular structure, degree distribution, connected components, density, and clustering coefficients.
arXiv Detail & Related papers (2023-10-02T23:23:00Z) - Probabilistic Modeling: Proving the Lottery Ticket Hypothesis in Spiking
Neural Network [30.924449325020767]
Lottery Ticket Hypothesis (LTH) states that a randomly-d large neural network contains a small sub-network.
LTH opens up a new path for pruning network.
arXiv Detail & Related papers (2023-05-20T09:27:34Z) - Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights.
We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Why Neural Networks Work [0.32228025627337864]
We argue that many properties of fully-connected feedforward neural networks (FCNNs) are explainable from the analysis of a single pair of operations.
We show how expand-and-sparsify can explain the observed phenomena that have been discussed in the literature.
arXiv Detail & Related papers (2022-11-26T18:15:17Z) - Equivariant Transduction through Invariant Alignment [71.45263447328374]
We introduce a novel group-equivariant architecture that incorporates a group-in hard alignment mechanism.
We find that our network's structure allows it to develop stronger equivariant properties than existing group-equivariant approaches.
We additionally find that it outperforms previous group-equivariant networks empirically on the SCAN task.
arXiv Detail & Related papers (2022-09-22T11:19:45Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - Learning from Randomly Initialized Neural Network Features [24.75062551820944]
We present the surprising result that randomly neural networks are good feature extractors in expectation.
These random features correspond to finite-sample realizations of what we call Neural Network Prior Kernel (NNPK), which is inherently infinite-dimensional.
arXiv Detail & Related papers (2022-02-13T23:35:11Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.