Learning Narrow One-Hidden-Layer ReLU Networks
- URL: http://arxiv.org/abs/2304.10524v1
- Date: Thu, 20 Apr 2023 17:53:09 GMT
- Title: Learning Narrow One-Hidden-Layer ReLU Networks
- Authors: Sitan Chen, Zehao Dou, Surbhi Goel, Adam R Klivans, Raghu Meka
- Abstract summary: First-time algorithm succeeds whenever $k$ is a constant.
We use a multi-scale analysis to argue that sufficiently close neurons can be collapsed together.
- Score: 30.63086568341548
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We consider the well-studied problem of learning a linear combination of $k$
ReLU activations with respect to a Gaussian distribution on inputs in $d$
dimensions. We give the first polynomial-time algorithm that succeeds whenever
$k$ is a constant. All prior polynomial-time learners require additional
assumptions on the network, such as positive combining coefficients or the
matrix of hidden weight vectors being well-conditioned.
Our approach is based on analyzing random contractions of higher-order moment
tensors. We use a multi-scale analysis to argue that sufficiently close neurons
can be collapsed together, sidestepping the conditioning issues present in
prior work. This allows us to design an iterative procedure to discover
individual neurons.
Related papers
- Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Robustly Learning a Single Neuron via Sharpness [45.40045240987339]
We study the problem of learning a single neuron with respect to the $L2$-loss in the presence of adversarial label noise.
We give an efficient algorithm that, for a broad family of activations including ReLUs, approximates the optimal $L2$-error within a constant factor.
arXiv Detail & Related papers (2023-06-13T16:34:02Z) - A Unified Algebraic Perspective on Lipschitz Neural Networks [88.14073994459586]
This paper introduces a novel perspective unifying various types of 1-Lipschitz neural networks.
We show that many existing techniques can be derived and generalized via finding analytical solutions of a common semidefinite programming (SDP) condition.
Our approach, called SDP-based Lipschitz Layers (SLL), allows us to design non-trivial yet efficient generalization of convex potential layers.
arXiv Detail & Related papers (2023-03-06T14:31:09Z) - Learning Polynomial Transformations [41.30404402331856]
We consider the problem of learning high dimensional quadratic transformations of Gaussians.
Our results extend to any-invariant input distribution, not just Gaussian.
We also give the first decomposition-time algorithms with provable guarantees for tensor ring decomposition.
arXiv Detail & Related papers (2022-04-08T17:59:31Z) - The Separation Capacity of Random Neural Networks [78.25060223808936]
We show that a sufficiently large two-layer ReLU-network with standard Gaussian weights and uniformly distributed biases can solve this problem with high probability.
We quantify the relevant structure of the data in terms of a novel notion of mutual complexity.
arXiv Detail & Related papers (2021-07-31T10:25:26Z) - Efficient Algorithms for Learning Depth-2 Neural Networks with General
ReLU Activations [27.244958998196623]
We present time and sample efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations.
In particular, we consider learning an unknown network of the form $f(x) = amathsfTsigma(WmathsfTx+b)$, where $x$ is drawn from the Gaussian distribution, and $sigma(t) := max(t,0)$ is the ReLU activation.
arXiv Detail & Related papers (2021-07-21T17:06:03Z) - Recurrently Predicting Hypergraphs [30.092688729343678]
A problem arises from the number of possible multi-way relationships, or hyperedges, scaling in $mathcalO(2n)$ for a set of $n$ elements.
We propose a recurrent hypergraph neural network that predicts the incidence matrix by iteratively refining an initial guess of the solution.
arXiv Detail & Related papers (2021-06-26T01:12:41Z) - Towards an Understanding of Benign Overfitting in Neural Networks [104.2956323934544]
Modern machine learning models often employ a huge number of parameters and are typically optimized to have zero training loss.
We examine how these benign overfitting phenomena occur in a two-layer neural network setting.
We show that it is possible for the two-layer ReLU network interpolator to achieve a near minimax-optimal learning rate.
arXiv Detail & Related papers (2021-06-06T19:08:53Z) - Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
We show that learning $ktimes k$, rank-$r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ samples.
We propose an algorithm that uses $cal O(frackrepsilon2log2fracepsilon)$ samples, a number linear in the high dimension, and nearly linear in the matrices, typically low, rank proofs.
arXiv Detail & Related papers (2020-09-30T19:10:32Z) - Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU
Networks [48.32532049640782]
We study the problem of learning one-hidden-layer ReLU networks with $k$ hidden units on $mathbbRd$ under Gaussian marginals.
For the case of positive coefficients, we give the first-time algorithm for this learning problem for $k$ up to $tildeOOmega(sqrtlog d)$.
arXiv Detail & Related papers (2020-06-22T17:53:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.