Convolutional and Residual Networks Provably Contain Lottery Tickets
- URL: http://arxiv.org/abs/2205.02343v1
- Date: Wed, 4 May 2022 22:20:01 GMT
- Title: Convolutional and Residual Networks Provably Contain Lottery Tickets
- Authors: Rebekka Burkholz
- Abstract summary: The Lottery Ticket Hypothesis continues to have a profound practical impact on the quest for deep neural networks that solve modern deep learning tasks at competitive performance.
We prove that also modern architectures consisting of convolutional and residual layers that can be equipped with almost arbitrary activation functions can contain lottery tickets with high probability.
- Score: 6.68999512375737
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The Lottery Ticket Hypothesis continues to have a profound practical impact
on the quest for small scale deep neural networks that solve modern deep
learning tasks at competitive performance. These lottery tickets are identified
by pruning large randomly initialized neural networks with architectures that
are as diverse as their applications. Yet, theoretical insights that attest
their existence have been mostly focused on deep fully-connected feed forward
networks with ReLU activation functions. We prove that also modern
architectures consisting of convolutional and residual layers that can be
equipped with almost arbitrary activation functions can contain lottery tickets
with high probability.
Related papers
- Pursing the Sparse Limitation of Spiking Deep Learning Structures [42.334835610250714]
Spiking Neural Networks (SNNs) are garnering increased attention for their superior computation and energy efficiency.
We introduce an innovative algorithm capable of simultaneously identifying both weight and patch-level winning tickets.
We demonstrate that our spiking lottery ticket achieves comparable or superior performance even when the model structure is extremely sparse.
arXiv Detail & Related papers (2023-11-18T17:00:40Z) - Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural
Networks [49.808194368781095]
We show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks.
This work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.
arXiv Detail & Related papers (2023-05-11T17:19:30Z) - Generalization and Estimation Error Bounds for Model-based Neural
Networks [78.88759757988761]
We show that the generalization abilities of model-based networks for sparse recovery outperform those of regular ReLU networks.
We derive practical design rules that allow to construct model-based networks with guaranteed high generalization.
arXiv Detail & Related papers (2023-04-19T16:39:44Z) - The Multiple Subnetwork Hypothesis: Enabling Multidomain Learning by
Isolating Task-Specific Subnetworks in Feedforward Neural Networks [0.0]
We identify a methodology and network representational structure which allows a pruned network to employ previously unused weights to learn subsequent tasks.
We show that networks trained using our approaches are able to learn multiple tasks, which may be related or unrelated, in parallel or in sequence without sacrificing performance on any task or exhibiting catastrophic forgetting.
arXiv Detail & Related papers (2022-07-18T15:07:13Z) - Most Activation Functions Can Win the Lottery Without Excessive Depth [6.68999512375737]
Lottery ticket hypothesis has highlighted the potential for training deep neural networks by pruning.
For networks with ReLU activation functions, it has been proven that a target network with depth $L$ can be approximated by the subnetwork of a randomly neural network that has double the target's depth $2L$ and is wider by a logarithmic factor.
arXiv Detail & Related papers (2022-05-04T20:51:30Z) - Quasi-orthogonality and intrinsic dimensions as measures of learning and
generalisation [55.80128181112308]
We show that dimensionality and quasi-orthogonality of neural networks' feature space may jointly serve as network's performance discriminants.
Our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces.
arXiv Detail & Related papers (2022-03-30T21:47:32Z) - On the Existence of Universal Lottery Tickets [2.5234156040689237]
Lottery ticket hypothesis conjectures existence of sparseworks of large randomly deep neural networks that can be successfully trained in isolation.
Recent work has experimentally observed that some of these tickets can be practically reused across a variety of tasks, hinting at some form of universality.
We formalize this concept and theoretically prove that not only do such universal tickets exist but they also do not require further training.
arXiv Detail & Related papers (2021-11-22T12:12:00Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - Universality of Deep Neural Network Lottery Tickets: A Renormalization
Group Perspective [89.19516919095904]
Winning tickets found in the context of one task can be transferred to similar tasks, possibly even across different architectures.
We make use of renormalization group theory, one of the most successful tools in theoretical physics.
We leverage here to examine winning ticket universality in large scale lottery ticket experiments, as well as sheds new light on the success iterative magnitude pruning has found in the field of sparse machine learning.
arXiv Detail & Related papers (2021-10-07T06:50:16Z) - Recursive Multi-model Complementary Deep Fusion forRobust Salient Object
Detection via Parallel Sub Networks [62.26677215668959]
Fully convolutional networks have shown outstanding performance in the salient object detection (SOD) field.
This paper proposes a wider'' network architecture which consists of parallel sub networks with totally different network architectures.
Experiments on several famous benchmarks clearly demonstrate the superior performance, good generalization, and powerful learning ability of the proposed wider framework.
arXiv Detail & Related papers (2020-08-07T10:39:11Z) - Memory capacity of neural networks with threshold and ReLU activations [2.5889737226898437]
mildly overparametrized neural networks are often able to memorize the training data with $100%$ accuracy.
We prove that this phenomenon holds for general multilayered perceptrons.
arXiv Detail & Related papers (2020-01-20T01:54:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.