Related papers: Learning Polynomials of Few Relevant Dimensions

Learning Polynomials of Few Relevant Dimensions

URL: http://arxiv.org/abs/2004.13748v1
Date: Tue, 28 Apr 2020 18:00:18 GMT
Title: Learning Polynomials of Few Relevant Dimensions
Authors: Sitan Chen, Raghu Meka
Abstract summary: Polynomial regression is a basic primitive in learning and statistics. We give an algorithm that learns the runtime within accuracy $epsilon$ with sample complexity that is roughly $N = O_r,d(n log2(1/epsilon) (log n)d)$ and $O_r,d(N n2)$.
Score: 12.122488725065388
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Polynomial regression is a basic primitive in learning and statistics. In its most basic form the goal is to fit a degree $d$ polynomial to a response variable $y$ in terms of an $n$-dimensional input vector $x$. This is extremely well-studied with many applications and has sample and runtime complexity $\Theta(n^d)$. Can one achieve better runtime if the intrinsic dimension of the data is much smaller than the ambient dimension $n$? Concretely, we are given samples $(x,y)$ where $y$ is a degree at most $d$ polynomial in an unknown $r$-dimensional projection (the relevant dimensions) of $x$. This can be seen both as a generalization of phase retrieval and as a special case of learning multi-index models where the link function is an unknown low-degree polynomial. Note that without distributional assumptions, this is at least as hard as junta learning. In this work we consider the important case where the covariates are Gaussian. We give an algorithm that learns the polynomial within accuracy $\epsilon$ with sample complexity that is roughly $N = O_{r,d}(n \log^2(1/\epsilon) (\log n)^d)$ and runtime $O_{r,d}(N n^2)$. Prior to our work, no such results were known even for the case of $r=1$. We introduce a new filtered PCA approach to get a warm start for the true subspace and use geodesic SGD to boost to arbitrary accuracy; our techniques may be of independent interest, especially for problems dealing with subspace recovery or analyzing SGD on manifolds.

Related papers

A computational transition for detecting multivariate shuffled linear regression by low-degree polynomials [0.0]
We investigate the model $Y=tfrac1sqrt1+sigma2(Pi_* X Q_* + sigma Z) Consider the hypothesis testing problem of distinguishing this model from the case where $X$ and $Y$. Results establish a smooth transition in the effectiveness of low-degree algorithms for this problem.
arXiv Detail & Related papers (2025-04-04T00:32:38Z)
Robust Learning of Multi-index Models via Iterative Subspace Approximation [36.138661719725626]
We study the task of learning Multi-Index Models (MIMs) with label noise under the Gaussian distribution. We focus on well-behaved MIMs with finite ranges that satisfy certain regularity properties. We show that in the presence of random classification noise, the complexity of our algorithm scales agnosticly with $1/epsilon$.
arXiv Detail & Related papers (2025-02-13T17:37:42Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data. We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z)
Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks. For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error. This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
TURF: A Two-factor, Universal, Robust, Fast Distribution Learning Algorithm [64.13217062232874]
One of its most powerful and successful modalities approximates every distribution to an $ell$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d_$. We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation.
arXiv Detail & Related papers (2022-02-15T03:49:28Z)
Private and polynomial time algorithms for learning Gaussians and beyond [13.947461378608525]
We present a framework for reducing $(varepsilon, delta)$ differentially private (DP) statistical estimation to its non-private counterpart. We provide the first time $(varepsilon, delta)$-DP algorithm for robust learning of (unrestricted) Gaussians.
arXiv Detail & Related papers (2021-11-22T16:25:51Z)
Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$. Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z)
Average-case Complexity of Teaching Convex Polytopes via Halfspace Queries [55.28642461328172]
We show that the average-case teaching complexity is $Theta(d)$, which is in sharp contrast to the worst-case teaching complexity of $Theta(n)$. Our insights allow us to establish a tight bound on the average-case complexity for $phi$-separable dichotomies.
arXiv Detail & Related papers (2020-06-25T19:59:24Z)
Learning sums of powers of low-degree polynomials in the non-degenerate case [2.6109033135086777]
We give a learning algorithm for an arithmetic circuit model from a lower bound for the same model, provided certain non-degeneracy conditions hold. Our algorithm is based on a scheme for obtaining a learning algorithm for an arithmetic circuit model from a lower bound for the same model.
arXiv Detail & Related papers (2020-04-15T06:18:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.