Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural
Networks
- URL: http://arxiv.org/abs/2305.06986v2
- Date: Tue, 31 Oct 2023 14:58:09 GMT
- Title: Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural
Networks
- Authors: Eshaan Nichani, Alex Damian, Jason D. Lee
- Abstract summary: We show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks.
This work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.
- Score: 49.808194368781095
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the central questions in the theory of deep learning is to understand
how neural networks learn hierarchical features. The ability of deep networks
to extract salient features is crucial to both their outstanding generalization
ability and the modern deep learning paradigm of pretraining and finetuneing.
However, this feature learning process remains poorly understood from a
theoretical perspective, with existing analyses largely restricted to two-layer
networks. In this work we show that three-layer neural networks have provably
richer feature learning capabilities than two-layer networks. We analyze the
features learned by a three-layer network trained with layer-wise gradient
descent, and present a general purpose theorem which upper bounds the sample
complexity and width needed to achieve low test error when the target has
specific hierarchical structure. We instantiate our framework in specific
statistical learning settings -- single-index models and functions of quadratic
features -- and show that in the latter setting three-layer networks obtain a
sample complexity improvement over all existing guarantees for two-layer
networks. Crucially, this sample complexity improvement relies on the ability
of three-layer networks to efficiently learn nonlinear features. We then
establish a concrete optimization-based depth separation by constructing a
function which is efficiently learnable via gradient descent on a three-layer
network, yet cannot be learned efficiently by a two-layer network. Our work
makes progress towards understanding the provable benefit of three-layer neural
networks over two-layer networks in the feature learning regime.
Related papers
- Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Understanding Deep Representation Learning via Layerwise Feature
Compression and Discrimination [33.273226655730326]
We show that each layer of a deep linear network progressively compresses within-class features at a geometric rate and discriminates between-class features at a linear rate.
This is the first quantitative characterization of feature evolution in hierarchical representations of deep linear networks.
arXiv Detail & Related papers (2023-11-06T09:00:38Z) - Deep Dependency Networks for Multi-Label Classification [24.24496964886951]
We show that the performance of previous approaches that combine Markov Random Fields with neural networks can be modestly improved.
We propose a new modeling framework called deep dependency networks, which augments a dependency network.
Despite its simplicity, jointly learning this new architecture yields significant improvements in performance.
arXiv Detail & Related papers (2023-02-01T17:52:40Z) - Neural Network Layer Algebra: A Framework to Measure Capacity and
Compression in Deep Learning [0.0]
We present a new framework to measure the intrinsic properties of (deep) neural networks.
While we focus on convolutional networks, our framework can be extrapolated to any network architecture.
arXiv Detail & Related papers (2021-07-02T13:43:53Z) - On Learnability via Gradient Method for Two-Layer ReLU Neural Networks
in Teacher-Student Setting [41.60125423028092]
We study two-layer ReLU networks in a teacher-student regression model.
We show that with a specific regularization and sufficient over- parameterization, a student network can identify the parameters via descent.
We analyze the global minima on a sparse global property in the measure space.
arXiv Detail & Related papers (2021-06-11T09:05:41Z) - Learning distinct features helps, provably [98.78384185493624]
We study the diversity of the features learned by a two-layer neural network trained with the least squares loss.
We measure the diversity by the average $L$-distance between the hidden-layer features.
arXiv Detail & Related papers (2021-06-10T19:14:45Z) - Firefly Neural Architecture Descent: a General Approach for Growing
Neural Networks [50.684661759340145]
Firefly neural architecture descent is a general framework for progressively and dynamically growing neural networks.
We show that firefly descent can flexibly grow networks both wider and deeper, and can be applied to learn accurate but resource-efficient neural architectures.
In particular, it learns networks that are smaller in size but have higher average accuracy than those learned by the state-of-the-art methods.
arXiv Detail & Related papers (2021-02-17T04:47:18Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Neural networks adapting to datasets: learning network size and topology [77.34726150561087]
We introduce a flexible setup allowing for a neural network to learn both its size and topology during the course of a gradient-based training.
The resulting network has the structure of a graph tailored to the particular learning task and dataset.
arXiv Detail & Related papers (2020-06-22T12:46:44Z) - A Rigorous Framework for the Mean Field Limit of Multilayer Neural
Networks [9.89901717499058]
We develop a mathematically rigorous framework for embedding neural networks in the mean field regime.
As the network's widths increase, the network's learning trajectory is shown to be well captured by a limit.
We prove several properties of large-width multilayer networks.
arXiv Detail & Related papers (2020-01-30T16:43:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.