Related papers: Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks

Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks

URL: http://arxiv.org/abs/2305.06986v2
Date: Tue, 31 Oct 2023 14:58:09 GMT
Title: Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks
Authors: Eshaan Nichani, Alex Damian, Jason D. Lee
Abstract summary: We show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks. This work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.
Score: 49.808194368781095
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: One of the central questions in the theory of deep learning is to understand how neural networks learn hierarchical features. The ability of deep networks to extract salient features is crucial to both their outstanding generalization ability and the modern deep learning paradigm of pretraining and finetuneing. However, this feature learning process remains poorly understood from a theoretical perspective, with existing analyses largely restricted to two-layer networks. In this work we show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks. We analyze the features learned by a three-layer network trained with layer-wise gradient descent, and present a general purpose theorem which upper bounds the sample complexity and width needed to achieve low test error when the target has specific hierarchical structure. We instantiate our framework in specific statistical learning settings -- single-index models and functions of quadratic features -- and show that in the latter setting three-layer networks obtain a sample complexity improvement over all existing guarantees for two-layer networks. Crucially, this sample complexity improvement relies on the ability of three-layer networks to efficiently learn nonlinear features. We then establish a concrete optimization-based depth separation by constructing a function which is efficiently learnable via gradient descent on a three-layer network, yet cannot be learned efficiently by a two-layer network. Our work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.

Related papers

Convergence Analysis for Deep Sparse Coding via Convolutional Neural Networks [7.956678963695681]
We explore intersections between sparse coding and deep learning to enhance our understanding of feature extraction capabilities. We derive convergence rates for convolutional neural networks (CNNs) in their ability to extract sparse features. Inspired by the strong connection between sparse coding and CNNs, we explore training strategies to encourage neural networks to learn more sparse features.
arXiv Detail & Related papers (2024-08-10T12:43:55Z)
Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks. We show that the networks acquire strong, data-dependent features. Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z)
Understanding Deep Representation Learning via Layerwise Feature Compression and Discrimination [33.273226655730326]
We show that each layer of a deep linear network progressively compresses within-class features at a geometric rate and discriminates between-class features at a linear rate. This is the first quantitative characterization of feature evolution in hierarchical representations of deep linear networks.
arXiv Detail & Related papers (2023-11-06T09:00:38Z)
Deep Dependency Networks for Multi-Label Classification [24.24496964886951]
We show that the performance of previous approaches that combine Markov Random Fields with neural networks can be modestly improved. We propose a new modeling framework called deep dependency networks, which augments a dependency network. Despite its simplicity, jointly learning this new architecture yields significant improvements in performance.
arXiv Detail & Related papers (2023-02-01T17:52:40Z)
Neural Network Layer Algebra: A Framework to Measure Capacity and Compression in Deep Learning [0.0]
We present a new framework to measure the intrinsic properties of (deep) neural networks. While we focus on convolutional networks, our framework can be extrapolated to any network architecture.
arXiv Detail & Related papers (2021-07-02T13:43:53Z)
On Learnability via Gradient Method for Two-Layer ReLU Neural Networks in Teacher-Student Setting [41.60125423028092]
We study two-layer ReLU networks in a teacher-student regression model. We show that with a specific regularization and sufficient over- parameterization, a student network can identify the parameters via descent. We analyze the global minima on a sparse global property in the measure space.
arXiv Detail & Related papers (2021-06-11T09:05:41Z)
Learning distinct features helps, provably [98.78384185493624]
We study the diversity of the features learned by a two-layer neural network trained with the least squares loss. We measure the diversity by the average $L$-distance between the hidden-layer features.
arXiv Detail & Related papers (2021-06-10T19:14:45Z)
Firefly Neural Architecture Descent: a General Approach for Growing Neural Networks [50.684661759340145]
Firefly neural architecture descent is a general framework for progressively and dynamically growing neural networks. We show that firefly descent can flexibly grow networks both wider and deeper, and can be applied to learn accurate but resource-efficient neural architectures. In particular, it learns networks that are smaller in size but have higher average accuracy than those learned by the state-of-the-art methods.
arXiv Detail & Related papers (2021-02-17T04:47:18Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
Neural networks adapting to datasets: learning network size and topology [77.34726150561087]
We introduce a flexible setup allowing for a neural network to learn both its size and topology during the course of a gradient-based training. The resulting network has the structure of a graph tailored to the particular learning task and dataset.
arXiv Detail & Related papers (2020-06-22T12:46:44Z)
A Rigorous Framework for the Mean Field Limit of Multilayer Neural Networks [9.89901717499058]
We develop a mathematically rigorous framework for embedding neural networks in the mean field regime. As the network's widths increase, the network's learning trajectory is shown to be well captured by a limit. We prove several properties of large-width multilayer networks.
arXiv Detail & Related papers (2020-01-30T16:43:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.