Depth Separation with Multilayer Mean-Field Networks
- URL: http://arxiv.org/abs/2304.01063v1
- Date: Mon, 3 Apr 2023 15:18:16 GMT
- Title: Depth Separation with Multilayer Mean-Field Networks
- Authors: Yunwei Ren, Mo Zhou, Rong Ge
- Abstract summary: We show that arXiv:1904.06984 constructed a function that is easy to approximate using a 3-layer network but not approximable by any 2-layer network.
Our result relies on a new way of extending the mean-field limit to multilayer networks.
- Score: 14.01059700772468
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth separation -- why a deeper network is more powerful than a shallower
one -- has been a major problem in deep learning theory. Previous results often
focus on representation power. For example, arXiv:1904.06984 constructed a
function that is easy to approximate using a 3-layer network but not
approximable by any 2-layer network. In this paper, we show that this
separation is in fact algorithmic: one can learn the function constructed by
arXiv:1904.06984 using an overparameterized network with polynomially many
neurons efficiently. Our result relies on a new way of extending the mean-field
limit to multilayer networks, and a decomposition of loss that factors out the
error introduced by the discretization of infinite-width mean-field networks.
Related papers
- Rethink Depth Separation with Intra-layer Links [23.867032824891723]
We study the depth separation theory in the context of shortcuts.
We show that a shallow network with intra-layer links does not need to go as wide as before to express some hard functions constructed by a deep network.
Our results supplement the existing depth separation theory by examining its limit in the shortcut domain.
arXiv Detail & Related papers (2023-05-11T11:54:36Z) - Rosenblatt's first theorem and frugality of deep learning [0.0]
Rosenblatt's theorem about omnipotence of shallow networks states that elementary perceptrons can solve any classification problem if there are no discrepancies in the training set.
Minsky and Papert considered elementary perceptrons with restrictions on the neural inputs a bounded number of connections or a relatively small diameter of the receptive field for each neuron at the hidden layer.
In this note, we demonstrated first Rosenblatt's theorem at work, showed how an elementary perceptron can solve a version of the travel maze problem, and analysed the complexity of that solution.
arXiv Detail & Related papers (2022-08-29T09:44:27Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Adversarial Examples in Multi-Layer Random ReLU Networks [39.797621513256026]
adversarial examples arise in ReLU networks with independent gaussian parameters.
Bottleneck layers in the network play a key role: the minimal width up to some point determines scales and sensitivities of mappings computed up to that point.
arXiv Detail & Related papers (2021-06-23T18:16:34Z) - Learning distinct features helps, provably [98.78384185493624]
We study the diversity of the features learned by a two-layer neural network trained with the least squares loss.
We measure the diversity by the average $L$-distance between the hidden-layer features.
arXiv Detail & Related papers (2021-06-10T19:14:45Z) - The Connection Between Approximation, Depth Separation and Learnability
in Neural Networks [70.55686685872008]
We study the connection between learnability and approximation capacity.
We show that learnability with deep networks of a target function depends on the ability of simpler classes to approximate the target.
arXiv Detail & Related papers (2021-01-31T11:32:30Z) - Permute, Quantize, and Fine-tune: Efficient Compression of Neural
Networks [70.0243910593064]
Key to success of vector quantization is deciding which parameter groups should be compressed together.
In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function.
We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
arXiv Detail & Related papers (2020-10-29T15:47:26Z) - Learning Deep ReLU Networks Is Fixed-Parameter Tractable [21.625005195943707]
We consider the problem of learning an unknown ReLU network with respect to Gaussian inputs.
We give an algorithm whose running time is a fixed weights in the ambient dimension.
Our bounds depend on the number of hidden units, depth, spectral norm of the spectral norm, and Lipschitz constant.
arXiv Detail & Related papers (2020-09-28T17:58:43Z) - ESPN: Extremely Sparse Pruned Networks [50.436905934791035]
We show that a simple iterative mask discovery method can achieve state-of-the-art compression of very deep networks.
Our algorithm represents a hybrid approach between single shot network pruning methods and Lottery-Ticket type approaches.
arXiv Detail & Related papers (2020-06-28T23:09:27Z) - Neural Networks with Small Weights and Depth-Separation Barriers [40.66211670342284]
For constant depths, existing results are limited to depths $2$ and $3$, and achieving results for higher depths has been an important question.
We focus on feedforward ReLU networks, and prove fundamental barriers to proving such results beyond depth $4$.
arXiv Detail & Related papers (2020-05-31T21:56:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.