The Connection Between Approximation, Depth Separation and Learnability
in Neural Networks
- URL: http://arxiv.org/abs/2102.00434v1
- Date: Sun, 31 Jan 2021 11:32:30 GMT
- Title: The Connection Between Approximation, Depth Separation and Learnability
in Neural Networks
- Authors: Eran Malach, Gilad Yehudai, Shai Shalev-Shwartz, Ohad Shamir
- Abstract summary: We study the connection between learnability and approximation capacity.
We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target.
- Score: 70.55686685872008
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Several recent works have shown separation results between deep neural
networks, and hypothesis classes with inferior approximation capacity such as
shallow networks or kernel classes. On the other hand, the fact that deep
networks can efficiently express a target function does not mean this target
function can be learned efficiently by deep neural networks. In this work we
study the intricate connection between learnability and approximation capacity.
We show that learnability with deep networks of a target function depends on
the ability of simpler classes to approximate the target. Specifically, we show
that a necessary condition for a function to be learnable by gradient descent
on deep neural networks is to be able to approximate the function, at least in
a weak sense, with shallow neural networks. We also show that a class of
functions can be learned by an efficient statistical query algorithm if and
only if it can be approximated in a weak sense by some kernel class. We give
several examples of functions which demonstrate depth separation, and conclude
that they cannot be efficiently learned, even by a hypothesis class that can
efficiently approximate them.
Related papers
- Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Benefits of Overparameterized Convolutional Residual Networks: Function
Approximation under Smoothness Constraint [48.25573695787407]
We prove that large ConvResNets can not only approximate a target function in terms of function value, but also exhibit sufficient first-order smoothness.
Our theory partially justifies the benefits of using deep and wide networks in practice.
arXiv Detail & Related papers (2022-06-09T15:35:22Z) - Optimal Learning Rates of Deep Convolutional Neural Networks: Additive
Ridge Functions [19.762318115851617]
We consider the mean squared error analysis for deep convolutional neural networks.
We show that, for additive ridge functions, convolutional neural networks followed by one fully connected layer with ReLU activation functions can reach optimal mini-max rates.
arXiv Detail & Related papers (2022-02-24T14:22:32Z) - Interplay between depth of neural networks and locality of target
functions [5.33024001730262]
We report a remarkable interplay between depth and locality of a target function.
We find that depth is beneficial for learning local functions but detrimental to learning global functions.
arXiv Detail & Related papers (2022-01-28T12:41:24Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Neural networks with linear threshold activations: structure and
algorithms [1.795561427808824]
We show that 2 hidden layers are necessary and sufficient to represent any function representable in the class.
We also give precise bounds on the sizes of the neural networks required to represent any function in the class.
We propose a new class of neural networks that we call shortcut linear threshold networks.
arXiv Detail & Related papers (2021-11-15T22:33:52Z) - Firefly Neural Architecture Descent: a General Approach for Growing
Neural Networks [50.684661759340145]
Firefly neural architecture descent is a general framework for progressively and dynamically growing neural networks.
We show that firefly descent can flexibly grow networks both wider and deeper, and can be applied to learn accurate but resource-efficient neural architectures.
In particular, it learns networks that are smaller in size but have higher average accuracy than those learned by the state-of-the-art methods.
arXiv Detail & Related papers (2021-02-17T04:47:18Z) - Theoretical Analysis of the Advantage of Deepening Neural Networks [0.0]
It is important to know the expressivity of functions computable by deep neural networks.
By the two criteria, we show that to increase layers is more effective than to increase units at each layer on improving the expressivity of deep neural networks.
arXiv Detail & Related papers (2020-09-24T04:10:50Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Approximation smooth and sparse functions by deep neural networks
without saturation [0.6396288020763143]
In this paper, we aim at constructing deep neural networks with three hidden layers to approximate smooth and sparse functions.
We prove that the constructed deep nets can reach the optimal approximation rate in approximating both smooth and sparse functions with controllable magnitude of free parameters.
arXiv Detail & Related papers (2020-01-13T09:28:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.