Related papers: Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks

Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks

URL: http://arxiv.org/abs/2006.12476v1
Date: Mon, 22 Jun 2020 17:53:54 GMT
Title: Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks
Authors: Ilias Diakonikolas and Daniel M. Kane and Vasilis Kontonis and Nikos Zarifis
Abstract summary: We study the problem of learning one-hidden-layer ReLU networks with $k$ hidden units on $mathbbRd$ under Gaussian marginals. For the case of positive coefficients, we give the first-time algorithm for this learning problem for $k$ up to $tildeOOmega(sqrtlog d)$.
Score: 48.32532049640782
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of PAC learning one-hidden-layer ReLU networks with $k$ hidden units on $\mathbb{R}^d$ under Gaussian marginals in the presence of additive label noise. For the case of positive coefficients, we give the first polynomial-time algorithm for this learning problem for $k$ up to $\tilde{O}(\sqrt{\log d})$. Previously, no polynomial time algorithm was known, even for $k=3$. This answers an open question posed by~\cite{Kliv17}. Importantly, our algorithm does not require any assumptions about the rank of the weight matrix and its complexity is independent of its condition number. On the negative side, for the more general task of PAC learning one-hidden-layer ReLU networks with arbitrary real coefficients, we prove a Statistical Query lower bound of $d^{\Omega(k)}$. Thus, we provide a separation between the two classes in terms of efficient learnability. Our upper and lower bounds are general, extending to broader families of activation functions.

Related papers

SoS Certificates for Sparse Singular Values and Their Applications: Robust Statistics, Subspace Distortion, and More [37.208622097149714]
We give a new family of upper-time algorithms which can certify bounds on the maximum of $|M u|$. Our certification algorithm makes essential use of the Sum-of-Squares hierarchy.
arXiv Detail & Related papers (2024-12-30T18:59:46Z)
Agnostic Learning of Arbitrary ReLU Activation under Gaussian Marginals [10.850101961203752]
We consider the problem of learning an arbitrarily-biased ReLU activation (or neuron) over Gaussian marginals with the squared loss objective. Our main result is a time statistical query (SQ) algorithm that gives the first constant factor approximation for arbitrary bias. Our algorithm presents an interesting departure from existing algorithms, which are all based on gradient descent.
arXiv Detail & Related papers (2024-11-21T17:43:51Z)
Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct. Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$PAC version of the problem.
arXiv Detail & Related papers (2024-06-18T08:54:04Z)
Agnostically Learning Multi-index Models with Queries [54.290489524576756]
We study the power of query access for the task of agnostic learning under the Gaussian distribution. We show that query access gives significant runtime improvements over random examples for agnostically learning MIMs.
arXiv Detail & Related papers (2023-12-27T15:50:47Z)
A Doubly Robust Approach to Sparse Reinforcement Learning [19.68978899041642]
We propose a new regret algorithm for episodic sparse linear Markov decision process (SMDP) The proposed algorithm is $tildeO(sigma-1_min s_star H sqrtN)$, where $sigma_min$ denotes the restrictive minimum eigenvalue of the average Gram matrix of feature vectors.
arXiv Detail & Related papers (2023-10-23T18:52:17Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
Near-Optimal Bounds for Learning Gaussian Halfspaces with Random Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$. Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z)
Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations [27.244958998196623]
We present time and sample efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations. In particular, we consider learning an unknown network of the form $f(x) = amathsfTsigma(WmathsfTx+b)$, where $x$ is drawn from the Gaussian distribution, and $sigma(t) := max(t,0)$ is the ReLU activation.
arXiv Detail & Related papers (2021-07-21T17:06:03Z)
On Agnostic PAC Learning using $\mathcal{L}_2$-polynomial Regression and Fourier-based Algorithms [10.66048003460524]
We develop a framework using Hilbert spaces as a proxy to analyze PAC learning problems with structural properties. We demonstrate that PAC learning with 0-1 loss is equivalent to an optimization in the Hilbert space domain.
arXiv Detail & Related papers (2021-02-11T21:28:55Z)
Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$. Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.