Learning Polynomials of Few Relevant Dimensions
- URL: http://arxiv.org/abs/2004.13748v1
- Date: Tue, 28 Apr 2020 18:00:18 GMT
- Title: Learning Polynomials of Few Relevant Dimensions
- Authors: Sitan Chen, Raghu Meka
- Abstract summary: Polynomial regression is a basic primitive in learning and statistics.
We give an algorithm that learns the runtime within accuracy $epsilon$ with sample complexity that is roughly $N = O_r,d(n log2(1/epsilon) (log n)d)$ and $O_r,d(N n2)$.
- Score: 12.122488725065388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Polynomial regression is a basic primitive in learning and statistics. In its
most basic form the goal is to fit a degree $d$ polynomial to a response
variable $y$ in terms of an $n$-dimensional input vector $x$. This is extremely
well-studied with many applications and has sample and runtime complexity
$\Theta(n^d)$. Can one achieve better runtime if the intrinsic dimension of the
data is much smaller than the ambient dimension $n$? Concretely, we are given
samples $(x,y)$ where $y$ is a degree at most $d$ polynomial in an unknown
$r$-dimensional projection (the relevant dimensions) of $x$. This can be seen
both as a generalization of phase retrieval and as a special case of learning
multi-index models where the link function is an unknown low-degree polynomial.
Note that without distributional assumptions, this is at least as hard as junta
learning.
In this work we consider the important case where the covariates are
Gaussian. We give an algorithm that learns the polynomial within accuracy
$\epsilon$ with sample complexity that is roughly $N = O_{r,d}(n
\log^2(1/\epsilon) (\log n)^d)$ and runtime $O_{r,d}(N n^2)$. Prior to our
work, no such results were known even for the case of $r=1$. We introduce a new
filtered PCA approach to get a warm start for the true subspace and use
geodesic SGD to boost to arbitrary accuracy; our techniques may be of
independent interest, especially for problems dealing with subspace recovery or
analyzing SGD on manifolds.
Related papers
- Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data.
We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - TURF: A Two-factor, Universal, Robust, Fast Distribution Learning
Algorithm [64.13217062232874]
One of its most powerful and successful modalities approximates every distribution to an $ell$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d_$.
We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation.
arXiv Detail & Related papers (2022-02-15T03:49:28Z) - Private and polynomial time algorithms for learning Gaussians and beyond [13.947461378608525]
We present a framework for reducing $(varepsilon, delta)$ differentially private (DP) statistical estimation to its non-private counterpart.
We provide the first time $(varepsilon, delta)$-DP algorithm for robust learning of (unrestricted) Gaussians.
arXiv Detail & Related papers (2021-11-22T16:25:51Z) - Small Covers for Near-Zero Sets of Polynomials and Learning Latent
Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$.
Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z) - Average-case Complexity of Teaching Convex Polytopes via Halfspace
Queries [55.28642461328172]
We show that the average-case teaching complexity is $Theta(d)$, which is in sharp contrast to the worst-case teaching complexity of $Theta(n)$.
Our insights allow us to establish a tight bound on the average-case complexity for $phi$-separable dichotomies.
arXiv Detail & Related papers (2020-06-25T19:59:24Z) - Learning sums of powers of low-degree polynomials in the non-degenerate
case [2.6109033135086777]
We give a learning algorithm for an arithmetic circuit model from a lower bound for the same model, provided certain non-degeneracy conditions hold.
Our algorithm is based on a scheme for obtaining a learning algorithm for an arithmetic circuit model from a lower bound for the same model.
arXiv Detail & Related papers (2020-04-15T06:18:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.