Related papers: Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets

Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets

URL: http://arxiv.org/abs/2311.09858v1
Date: Thu, 16 Nov 2023 12:38:45 GMT
Title: Polynomially Over-Parameterized Convolutional Neural Networks Contain Structured Strong Winning Lottery Tickets
Authors: Arthur da Cunha, Francesco d'Amore, Emanuele Natale
Abstract summary: We prove the existence of structured Neuralworks that can approximate any sufficiently smaller network. This result provides the first sub-exponential bound around the Strong Lottery Ticket Hypothesis.
Score: 4.020829863982153
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Strong Lottery Ticket Hypothesis (SLTH) states that randomly-initialised neural networks likely contain subnetworks that perform well without any training. Although unstructured pruning has been extensively studied in this context, its structured counterpart, which can deliver significant computational and memory efficiency gains, has been largely unexplored. One of the main reasons for this gap is the limitations of the underlying mathematical tools used in formal analyses of the SLTH. In this paper, we overcome these limitations: we leverage recent advances in the multidimensional generalisation of the Random Subset-Sum Problem and obtain a variant that admits the stochastic dependencies that arise when addressing structured pruning in the SLTH. We apply this result to prove, for a wide class of random Convolutional Neural Networks, the existence of structured subnetworks that can approximate any sufficiently smaller network. This result provides the first sub-exponential bound around the SLTH for structured pruning, opening up new avenues for further research on the hypothesis and contributing to the understanding of the role of over-parameterization in deep learning.

Related papers

Make Haste Slowly: A Theory of Emergent Structured Mixed Selectivity in Feature Learning ReLU Networks [16.83151955540625]
We take a step towards a theory of feature learning in finite ReLU networks. We show how structured mixed-selective latent representations can emerge due to a bias for node-reuse and learning speed.
arXiv Detail & Related papers (2025-03-08T11:47:33Z)
An Infinite-Width Analysis on the Jacobian-Regularised Training of a Neural Network [10.384951432591492]
Recent theoretical analysis of deep neural networks in their infinite-width limits has deepened our understanding of initialisation, feature learning, and training of those networks. We show that this infinite-width analysis can be extended to the Jacobian of a deep neural network. We experimentally show the relevance of our theoretical claims to wide finite networks, and empirically analyse the properties of kernel regression solution to obtain an insight into Jacobian regularisation.
arXiv Detail & Related papers (2023-12-06T09:52:18Z)
DANI: Fast Diffusion Aware Network Inference with Preserving Topological Structure Property [2.8948274245812327]
We propose a novel method called DANI to infer the underlying network while preserving its structural properties. DANI has higher accuracy and lower run time while maintaining structural properties, including modular structure, degree distribution, connected components, density, and clustering coefficients.
arXiv Detail & Related papers (2023-10-02T23:23:00Z)
Probabilistic Modeling: Proving the Lottery Ticket Hypothesis in Spiking Neural Network [30.924449325020767]
Lottery Ticket Hypothesis (LTH) states that a randomly-d large neural network contains a small sub-network. LTH opens up a new path for pruning network.
arXiv Detail & Related papers (2023-05-20T09:27:34Z)
Gradient Descent in Neural Networks as Sequential Learning in RKBS [63.011641517977644]
We construct an exact power-series representation of the neural network in a finite neighborhood of the initial weights. We prove that, regardless of width, the training sequence produced by gradient descent can be exactly replicated by regularized sequential learning.
arXiv Detail & Related papers (2023-02-01T03:18:07Z)
Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime. We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z)
Why Neural Networks Work [0.32228025627337864]
We argue that many properties of fully-connected feedforward neural networks (FCNNs) are explainable from the analysis of a single pair of operations. We show how expand-and-sparsify can explain the observed phenomena that have been discussed in the literature.
arXiv Detail & Related papers (2022-11-26T18:15:17Z)
Equivariant Transduction through Invariant Alignment [71.45263447328374]
We introduce a novel group-equivariant architecture that incorporates a group-in hard alignment mechanism. We find that our network's structure allows it to develop stronger equivariant properties than existing group-equivariant approaches. We additionally find that it outperforms previous group-equivariant networks empirically on the SCAN task.
arXiv Detail & Related papers (2022-09-22T11:19:45Z)
Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers. It is an instance of a key structural condition that applies across broad domains of machine learning. For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z)
Learning from Randomly Initialized Neural Network Features [24.75062551820944]
We present the surprising result that randomly neural networks are good feature extractors in expectation. These random features correspond to finite-sample realizations of what we call Neural Network Prior Kernel (NNPK), which is inherently infinite-dimensional.
arXiv Detail & Related papers (2022-02-13T23:35:11Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time. We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both. Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.