Optimization-Based Separations for Neural Networks
- URL: http://arxiv.org/abs/2112.02393v1
- Date: Sat, 4 Dec 2021 18:07:47 GMT
- Title: Optimization-Based Separations for Neural Networks
- Authors: Itay Safran, Jason D. Lee
- Abstract summary: We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
- Score: 57.875347246373956
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth separation results propose a possible theoretical explanation for the
benefits of deep neural networks over shallower architectures, establishing
that the former possess superior approximation capabilities. However, there are
no known results in which the deeper architecture leverages this advantage into
a provable optimization guarantee. We prove that when the data are generated by
a distribution with radial symmetry which satisfies some mild assumptions,
gradient descent can efficiently learn ball indicator functions using a depth 2
neural network with two layers of sigmoidal activations, and where the hidden
layer is held fixed throughout training. Since it is known that ball indicators
are hard to approximate with respect to a certain heavy-tailed distribution
when using depth 2 networks with a single layer of non-linearities (Safran and
Shamir, 2017), this establishes what is to the best of our knowledge, the first
optimization-based separation result where the approximation benefits of the
stronger architecture provably manifest in practice. Our proof technique relies
on a random features approach which reduces the problem to learning with a
single neuron, where new tools are required to show the convergence of gradient
descent when the distribution of the data is heavy-tailed.
Related papers
- Implicit Bias in Leaky ReLU Networks Trained on High-Dimensional Data [63.34506218832164]
In this work, we investigate the implicit bias of gradient flow and gradient descent in two-layer fully-connected neural networks with ReLU activations.
For gradient flow, we leverage recent work on the implicit bias for homogeneous neural networks to show that leakyally, gradient flow produces a neural network with rank at most two.
For gradient descent, provided the random variance is small enough, we show that a single step of gradient descent suffices to drastically reduce the rank of the network, and that the rank remains small throughout training.
arXiv Detail & Related papers (2022-10-13T15:09:54Z) - On the non-universality of deep learning: quantifying the cost of
symmetry [24.86176236641865]
We prove computational limitations for learning with neural networks trained by noisy gradient descent (GD)
We characterize functions that fully-connected networks can weak-learn on the binary hypercube and unit sphere.
Our techniques extend to gradient descent (SGD), for which we show nontrivial results for learning with fully-connected networks.
arXiv Detail & Related papers (2022-08-05T11:54:52Z) - A Local Geometric Interpretation of Feature Extraction in Deep
Feedforward Neural Networks [13.159994710917022]
In this paper, we present a local geometric analysis to interpret how deep feedforward neural networks extract low-dimensional features from high-dimensional data.
Our study shows that, in a local geometric region, the optimal weight in one layer of the neural network and the optimal feature generated by the previous layer comprise a low-rank approximation of a matrix that is determined by the Bayes action of this layer.
arXiv Detail & Related papers (2022-02-09T18:50:00Z) - Non-Gradient Manifold Neural Network [79.44066256794187]
Deep neural network (DNN) generally takes thousands of iterations to optimize via gradient descent.
We propose a novel manifold neural network based on non-gradient optimization.
arXiv Detail & Related papers (2021-06-15T06:39:13Z) - Universality and Optimality of Structured Deep Kernel Networks [0.0]
Kernel based methods yield approximation models that are flexible, efficient and powerful.
Recent success of machine learning methods has been driven by deep neural networks (NNs)
In this paper, we show that the use of special types of kernels yield models reminiscent of neural networks.
arXiv Detail & Related papers (2021-05-15T14:10:35Z) - Random Features for the Neural Tangent Kernel [57.132634274795066]
We propose an efficient feature map construction of the Neural Tangent Kernel (NTK) of fully-connected ReLU network.
We show that dimension of the resulting features is much smaller than other baseline feature map constructions to achieve comparable error bounds both in theory and practice.
arXiv Detail & Related papers (2021-04-03T09:08:12Z) - Analytically Tractable Inference in Deep Neural Networks [0.0]
Tractable Approximate Inference (TAGI) algorithm was shown to be a viable and scalable alternative to backpropagation for shallow fully-connected neural networks.
We are demonstrating how TAGI matches or exceeds the performance of backpropagation, for training classic deep neural network architectures.
arXiv Detail & Related papers (2021-03-09T14:51:34Z) - Learning Neural Network Subspaces [74.44457651546728]
Recent observations have advanced our understanding of the neural network optimization landscape.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.
arXiv Detail & Related papers (2021-02-20T23:26:58Z) - Dual-constrained Deep Semi-Supervised Coupled Factorization Network with
Enriched Prior [80.5637175255349]
We propose a new enriched prior based Dual-constrained Deep Semi-Supervised Coupled Factorization Network, called DS2CF-Net.
To ex-tract hidden deep features, DS2CF-Net is modeled as a deep-structure and geometrical structure-constrained neural network.
Our network can obtain state-of-the-art performance for representation learning and clustering.
arXiv Detail & Related papers (2020-09-08T13:10:21Z) - Approximation smooth and sparse functions by deep neural networks
without saturation [0.6396288020763143]
In this paper, we aim at constructing deep neural networks with three hidden layers to approximate smooth and sparse functions.
We prove that the constructed deep nets can reach the optimal approximation rate in approximating both smooth and sparse functions with controllable magnitude of free parameters.
arXiv Detail & Related papers (2020-01-13T09:28:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.