Most Activation Functions Can Win the Lottery Without Excessive Depth
- URL: http://arxiv.org/abs/2205.02321v1
- Date: Wed, 4 May 2022 20:51:30 GMT
- Title: Most Activation Functions Can Win the Lottery Without Excessive Depth
- Authors: Rebekka Burkholz
- Abstract summary: Lottery ticket hypothesis has highlighted the potential for training deep neural networks by pruning.
For networks with ReLU activation functions, it has been proven that a target network with depth $L$ can be approximated by the subnetwork of a randomly neural network that has double the target's depth $2L$ and is wider by a logarithmic factor.
- Score: 6.68999512375737
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The strong lottery ticket hypothesis has highlighted the potential for
training deep neural networks by pruning, which has inspired interesting
practical and theoretical insights into how neural networks can represent
functions. For networks with ReLU activation functions, it has been proven that
a target network with depth $L$ can be approximated by the subnetwork of a
randomly initialized neural network that has double the target's depth $2L$ and
is wider by a logarithmic factor. We show that a depth $L+1$ network is
sufficient. This result indicates that we can expect to find lottery tickets at
realistic, commonly used depths while only requiring logarithmic
overparametrization. Our novel construction approach applies to a large class
of activation functions and is not limited to ReLUs.
Related papers
- Expressivity and Approximation Properties of Deep Neural Networks with
ReLU$^k$ Activation [2.3020018305241337]
We investigate the expressivity and approximation properties of deep networks employing the ReLU$k$ activation function for $k geq 2$.
Although deep ReLU$k$ networks can approximates effectively, deep ReLU$k$ networks have the capability to represent higher-degrees precisely.
arXiv Detail & Related papers (2023-12-27T09:11:14Z) - Optimizing Performance of Feedforward and Convolutional Neural Networks
through Dynamic Activation Functions [0.46040036610482665]
Deep learning training algorithms are a huge success in recent years in many fields including speech, text,image video etc.
Deep and deeper layers are proposed with huge success with resnet structures having around 152 layers.
Shallow convolution neural networks(CNN's) are still an active research, where some phenomena are still unexplained.
arXiv Detail & Related papers (2023-08-10T17:39:51Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Understanding Deep Neural Function Approximation in Reinforcement
Learning via $\epsilon$-Greedy Exploration [53.90873926758026]
This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL)
We focus on the value based algorithm with the $epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces.
Our analysis reformulates the temporal difference error in an $L2(mathrmdmu)$-integrable space over a certain averaged measure $mu$, and transforms it to a generalization problem under the non-iid setting.
arXiv Detail & Related papers (2022-09-15T15:42:47Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - Convolutional and Residual Networks Provably Contain Lottery Tickets [6.68999512375737]
The Lottery Ticket Hypothesis continues to have a profound practical impact on the quest for deep neural networks that solve modern deep learning tasks at competitive performance.
We prove that also modern architectures consisting of convolutional and residual layers that can be equipped with almost arbitrary activation functions can contain lottery tickets with high probability.
arXiv Detail & Related papers (2022-05-04T22:20:01Z) - Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity
on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function.
We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z) - The Connection Between Approximation, Depth Separation and Learnability
in Neural Networks [70.55686685872008]
We study the connection between learnability and approximation capacity.
We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target.
arXiv Detail & Related papers (2021-01-31T11:32:30Z) - It's Hard for Neural Networks To Learn the Game of Life [4.061135251278187]
Recent findings suggest that neural networks rely on lucky random initial weights of "lottery tickets" that converge quickly to a solution.
We examine small convolutional networks that are trained to predict n steps of the two-dimensional cellular automaton Conway's Game of Life.
We find that networks of this architecture trained on this task rarely converge.
arXiv Detail & Related papers (2020-09-03T00:47:08Z) - Towards Understanding Hierarchical Learning: Benefits of Neural
Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks.
We show that neural representation can achieve improved sample complexities compared with the raw input.
Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z) - Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization
is Sufficient [9.309655246559094]
We show that any target network of width $d$ and depth $l$ can be approximated by pruning a random network that is a factor $O(log(dl))$ wider and twice as deep.
Our analysis relies on connecting pruning random ReLU networks to random instances of the textscSubset problem.
arXiv Detail & Related papers (2020-06-14T19:32:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.