Learning Deep ReLU Networks Is Fixed-Parameter Tractable
- URL: http://arxiv.org/abs/2009.13512v1
- Date: Mon, 28 Sep 2020 17:58:43 GMT
- Title: Learning Deep ReLU Networks Is Fixed-Parameter Tractable
- Authors: Sitan Chen, Adam R. Klivans, Raghu Meka
- Abstract summary: We consider the problem of learning an unknown ReLU network with respect to Gaussian inputs.
We give an algorithm whose running time is a fixed weights in the ambient dimension.
Our bounds depend on the number of hidden units, depth, spectral norm of the spectral norm, and Lipschitz constant.
- Score: 21.625005195943707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of learning an unknown ReLU network with respect to
Gaussian inputs and obtain the first nontrivial results for networks of depth
more than two. We give an algorithm whose running time is a fixed polynomial in
the ambient dimension and some (exponentially large) function of only the
network's parameters.
Our bounds depend on the number of hidden units, depth, spectral norm of the
weight matrices, and Lipschitz constant of the overall network (we show that
some dependence on the Lipschitz constant is necessary). We also give a bound
that is doubly exponential in the size of the network but is independent of
spectral norm. These results provably cannot be obtained using gradient-based
methods and give the first example of a class of efficiently learnable neural
networks that gradient descent will fail to learn.
In contrast, prior work for learning networks of depth three or higher
requires exponential time in the ambient dimension, even when the above
parameters are bounded by a constant. Additionally, all prior work for the
depth-two case requires well-conditioned weights and/or positive coefficients
to obtain efficient run-times. Our algorithm does not require these
assumptions.
Our main technical tool is a type of filtered PCA that can be used to
iteratively recover an approximate basis for the subspace spanned by the hidden
units in the first layer. Our analysis leverages new structural results on
lattice polynomials from tropical geometry.
Related papers
- Local Loss Optimization in the Infinite Width: Stable Parameterization of Predictive Coding Networks and Target Propagation [8.35644084613785]
We introduce the maximal update parameterization ($mu$P) in the infinite-width limit for two representative designs of local targets.
By analyzing deep linear networks, we found that PC's gradients interpolate between first-order and Gauss-Newton-like gradients.
We demonstrate that, in specific standard settings, PC in the infinite-width limit behaves more similarly to the first-order gradient.
arXiv Detail & Related papers (2024-11-04T11:38:27Z) - ECLipsE: Efficient Compositional Lipschitz Constant Estimation for Deep Neural Networks [0.8993153817914281]
Lipschitz constant plays a crucial role in certifying the robustness of neural networks to input perturbations.
Efforts have been made to obtain tight upper bounds on the Lipschitz constant.
We provide a compositional approach to estimate Lipschitz constants for deep feed-forward neural networks.
arXiv Detail & Related papers (2024-04-05T19:36:26Z) - Asymptotics of Learning with Deep Structured (Random) Features [9.366617422860543]
For a large class of feature maps we provide a tight characterisation of the test error associated with learning the readout layer.
In some cases our results can capture feature maps learned by deep, finite-width neural networks trained under gradient descent.
arXiv Detail & Related papers (2024-02-21T18:35:27Z) - Depth Separations in Neural Networks: Separating the Dimension from the Accuracy [9.783697404304027]
We prove an exponential size separation between depth 2 and depth 3 neural networks (with real inputs)
We show that the curse of dimensionality manifests itself in depth 2 approximation, even in cases where the target function can be represented efficiently using a depth 3 network.
arXiv Detail & Related papers (2024-02-11T17:27:26Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - A Unified Algebraic Perspective on Lipschitz Neural Networks [88.14073994459586]
This paper introduces a novel perspective unifying various types of 1-Lipschitz neural networks.
We show that many existing techniques can be derived and generalized via finding analytical solutions of a common semidefinite programming (SDP) condition.
Our approach, called SDP-based Lipschitz Layers (SLL), allows us to design non-trivial yet efficient generalization of convex potential layers.
arXiv Detail & Related papers (2023-03-06T14:31:09Z) - Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory.
We show that linear networks make provably optimal predictions at infinite depth.
We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z) - Globally Gated Deep Linear Networks [3.04585143845864]
We introduce Globally Gated Deep Linear Networks (GGDLNs) where gating units are shared among all processing units in each layer.
We derive exact equations for the generalization properties in these networks in the finite-width thermodynamic limit.
Our work is the first exact theoretical solution of learning in a family of nonlinear networks with finite width.
arXiv Detail & Related papers (2022-10-31T16:21:56Z) - Robust Training and Verification of Implicit Neural Networks: A
Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks.
We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network.
We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z) - Unified Field Theory for Deep and Recurrent Neural Networks [56.735884560668985]
We present a unified and systematic derivation of the mean-field theory for both recurrent and deep networks.
We find that convergence towards the mean-field theory is typically slower for recurrent networks than for deep networks.
Our method exposes that Gaussian processes are but the lowest order of a systematic expansion in $1/n$.
arXiv Detail & Related papers (2021-12-10T15:06:11Z) - Path Regularization: A Convexity and Sparsity Inducing Regularization
for Parallel ReLU Networks [75.33431791218302]
We study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape.
We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases.
arXiv Detail & Related papers (2021-10-18T18:00:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.