Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU
Networks
- URL: http://arxiv.org/abs/2006.12476v1
- Date: Mon, 22 Jun 2020 17:53:54 GMT
- Title: Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU
Networks
- Authors: Ilias Diakonikolas and Daniel M. Kane and Vasilis Kontonis and Nikos
Zarifis
- Abstract summary: We study the problem of learning one-hidden-layer ReLU networks with $k$ hidden units on $mathbbRd$ under Gaussian marginals.
For the case of positive coefficients, we give the first-time algorithm for this learning problem for $k$ up to $tildeOOmega(sqrtlog d)$.
- Score: 48.32532049640782
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of PAC learning one-hidden-layer ReLU networks with $k$
hidden units on $\mathbb{R}^d$ under Gaussian marginals in the presence of
additive label noise. For the case of positive coefficients, we give the first
polynomial-time algorithm for this learning problem for $k$ up to
$\tilde{O}(\sqrt{\log d})$. Previously, no polynomial time algorithm was known,
even for $k=3$. This answers an open question posed by~\cite{Kliv17}.
Importantly, our algorithm does not require any assumptions about the rank of
the weight matrix and its complexity is independent of its condition number. On
the negative side, for the more general task of PAC learning one-hidden-layer
ReLU networks with arbitrary real coefficients, we prove a Statistical Query
lower bound of $d^{\Omega(k)}$. Thus, we provide a separation between the two
classes in terms of efficient learnability. Our upper and lower bounds are
general, extending to broader families of activation functions.
Related papers
- Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct.
Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$PAC version of the problem.
arXiv Detail & Related papers (2024-06-18T08:54:04Z) - Agnostically Learning Multi-index Models with Queries [54.290489524576756]
We study the power of query access for the task of agnostic learning under the Gaussian distribution.
We show that query access gives significant runtime improvements over random examples for agnostically learning MIMs.
arXiv Detail & Related papers (2023-12-27T15:50:47Z) - A Doubly Robust Approach to Sparse Reinforcement Learning [19.68978899041642]
We propose a new regret algorithm for episodic sparse linear Markov decision process (SMDP)
The proposed algorithm is $tildeO(sigma-1_min s_star H sqrtN)$, where $sigma_min$ denotes the restrictive minimum eigenvalue of the average Gram matrix of feature vectors.
arXiv Detail & Related papers (2023-10-23T18:52:17Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Attribute-Efficient PAC Learning of Low-Degree Polynomial Threshold Functions with Nasty Noise [10.550885570889527]
We study PAC learning of $K$sparse degree-$d$ PTFs on $mathbbRn$.
Our main contribution is a new algorithm that runs in time $(nd/epsilon)O(d)$.
PAC learns the class up to error rate $epsilon$ with $O(fracK4depsilon2d cdot log5d n)$ even when an $eta leq O(epsilond)
arXiv Detail & Related papers (2023-06-01T13:49:22Z) - Efficient Algorithms for Learning Depth-2 Neural Networks with General
ReLU Activations [27.244958998196623]
We present time and sample efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations.
In particular, we consider learning an unknown network of the form $f(x) = amathsfTsigma(WmathsfTx+b)$, where $x$ is drawn from the Gaussian distribution, and $sigma(t) := max(t,0)$ is the ReLU activation.
arXiv Detail & Related papers (2021-07-21T17:06:03Z) - On Agnostic PAC Learning using $\mathcal{L}_2$-polynomial Regression and
Fourier-based Algorithms [10.66048003460524]
We develop a framework using Hilbert spaces as a proxy to analyze PAC learning problems with structural properties.
We demonstrate that PAC learning with 0-1 loss is equivalent to an optimization in the Hilbert space domain.
arXiv Detail & Related papers (2021-02-11T21:28:55Z) - Small Covers for Near-Zero Sets of Polynomials and Learning Latent
Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$.
Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.