Agnostically Learning Multi-index Models with Queries
- URL: http://arxiv.org/abs/2312.16616v1
- Date: Wed, 27 Dec 2023 15:50:47 GMT
- Title: Agnostically Learning Multi-index Models with Queries
- Authors: Ilias Diakonikolas, Daniel M. Kane, Vasilis Kontonis, Christos Tzamos,
Nikos Zarifis
- Abstract summary: We study the power of query access for the task of agnostic learning under the Gaussian distribution.
We show that query access gives significant runtime improvements over random examples for agnostically learning MIMs.
- Score: 54.290489524576756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the power of query access for the task of agnostic learning under
the Gaussian distribution. In the agnostic model, no assumptions are made on
the labels and the goal is to compute a hypothesis that is competitive with the
{\em best-fit} function in a known class, i.e., it achieves error
$\mathrm{opt}+\epsilon$, where $\mathrm{opt}$ is the error of the best function
in the class. We focus on a general family of Multi-Index Models (MIMs), which
are $d$-variate functions that depend only on few relevant directions, i.e.,
have the form $g(\mathbf{W} \mathbf{x})$ for an unknown link function $g$ and a
$k \times d$ matrix $\mathbf{W}$. Multi-index models cover a wide range of
commonly studied function classes, including constant-depth neural networks
with ReLU activations, and intersections of halfspaces.
Our main result shows that query access gives significant runtime
improvements over random examples for agnostically learning MIMs. Under
standard regularity assumptions for the link function (namely, bounded
variation or surface area), we give an agnostic query learner for MIMs with
complexity $O(k)^{\mathrm{poly}(1/\epsilon)} \; \mathrm{poly}(d) $. In
contrast, algorithms that rely only on random examples inherently require
$d^{\mathrm{poly}(1/\epsilon)}$ samples and runtime, even for the basic problem
of agnostically learning a single ReLU or a halfspace.
Our algorithmic result establishes a strong computational separation between
the agnostic PAC and the agnostic PAC+Query models under the Gaussian
distribution. Prior to our work, no such separation was known -- even for the
special case of agnostically learning a single halfspace, for which it was an
open problem first posed by Feldman. Our results are enabled by a general
dimension-reduction technique that leverages query access to estimate gradients
of (a smoothed version of) the underlying label function.
Related papers
- Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations [40.77319247558742]
We study the computational complexity of learning a target function $f_*:mathbbRdtomathbbR$ with additive structure.
We prove that a large subset of $f_*$ can be efficiently learned by gradient training of a two-layer neural network.
arXiv Detail & Related papers (2024-06-17T17:59:17Z) - Agnostic Active Learning of Single Index Models with Linear Sample Complexity [27.065175036001246]
We study active learning methods for single index models of the form $F(mathbf x) = f(langle mathbf w, mathbf xrangle)$.
arXiv Detail & Related papers (2024-05-15T13:11:28Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Testing distributional assumptions of learning algorithms [5.204779946147061]
We study the design of tester-learner pairs $(mathcalA,mathcalT)$.
We show that if the distribution on examples in the data passes the tester $mathcalT$ then one can safely trust the output of the agnostic $mathcalA$ on the data.
arXiv Detail & Related papers (2022-04-14T19:10:53Z) - Hardness of Learning Halfspaces with Massart Noise [56.98280399449707]
We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise.
We show that there an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a SQ algorithm.
arXiv Detail & Related papers (2020-12-17T16:43:11Z) - Small Covers for Near-Zero Sets of Polynomials and Learning Latent
Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$.
Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z) - Statistical-Query Lower Bounds via Functional Gradients [19.5924910463796]
We show that any statistical-query algorithm with tolerance $n- (1/epsilon)b$ must use at least $2nc epsilon$ queries for some constant $b.
Our results rule out general (as opposed to correlational) SQ learning algorithms, which is unusual for real-valued learning problems.
arXiv Detail & Related papers (2020-06-29T05:15:32Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.