Related papers: Learning sums of powers of low-degree polynomials in the non-degenerate case

Learning sums of powers of low-degree polynomials in the non-degenerate case

URL: http://arxiv.org/abs/2004.06898v2
Date: Tue, 16 Jun 2020 08:57:22 GMT
Title: Learning sums of powers of low-degree polynomials in the non-degenerate case
Authors: Ankit Garg, Neeraj Kayal, and Chandan Saha
Abstract summary: We give a learning algorithm for an arithmetic circuit model from a lower bound for the same model, provided certain non-degeneracy conditions hold. Our algorithm is based on a scheme for obtaining a learning algorithm for an arithmetic circuit model from a lower bound for the same model.
Score: 2.6109033135086777
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We develop algorithms for writing a polynomial as sums of powers of low degree polynomials. Consider an $n$-variate degree-$d$ polynomial $f$ which can be written as $$f = c_1Q_1^{m} + \ldots + c_s Q_s^{m},$$ where each $c_i\in \mathbb{F}^{\times}$, $Q_i$ is a homogeneous polynomial of degree $t$, and $t m = d$. In this paper, we give a $\text{poly}((ns)^t)$-time learning algorithm for finding the $Q_i$'s given (black-box access to) $f$, if the $Q_i's$ satisfy certain non-degeneracy conditions and $n$ is larger than $d^2$. The set of degenerate $Q_i$'s (i.e., inputs for which the algorithm does not work) form a non-trivial variety and hence if the $Q_i$'s are chosen according to any reasonable (full-dimensional) distribution, then they are non-degenerate with high probability (if $s$ is not too large). Our algorithm is based on a scheme for obtaining a learning algorithm for an arithmetic circuit model from a lower bound for the same model, provided certain non-degeneracy conditions hold. The scheme reduces the learning problem to the problem of decomposing two vector spaces under the action of a set of linear operators, where the spaces and the operators are derived from the input circuit and the complexity measure used in a typical lower bound proof. The non-degeneracy conditions are certain restrictions on how the spaces decompose.

Related papers

Information-Computation Tradeoffs for Noiseless Linear Regression with Oblivious Contamination [65.37519531362157]
We show that any efficient Statistical Query algorithm for this task requires VSTAT complexity at least $tildeOmega(d1/2/alpha2)$.
arXiv Detail & Related papers (2025-10-12T15:42:44Z)
Robust Learning of Multi-index Models via Iterative Subspace Approximation [36.138661719725626]
We study the task of learning Multi-Index Models (MIMs) with label noise under the Gaussian distribution. We focus on well-behaved MIMs with finite ranges that satisfy certain regularity properties. We show that in the presence of random classification noise, the complexity of our algorithm scales agnosticly with $1/epsilon$.
arXiv Detail & Related papers (2025-02-13T17:37:42Z)
SoS Certificates for Sparse Singular Values and Their Applications: Robust Statistics, Subspace Distortion, and More [37.208622097149714]
We give a new family of upper-time algorithms which can certify bounds on the maximum of $|M u|$.<n>Our certification algorithm makes essential use of the Sum-of-Squares hierarchy.
arXiv Detail & Related papers (2024-12-30T18:59:46Z)
The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$. As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z)
Unconditionally separating noisy $\mathsf{QNC}^0$ from bounded polynomial threshold circuits of constant depth [8.66267734067296]
We study classes of constant-depth circuits with bounds that compute restricted threshold functions. For large enough values of $mathsfbPTFC0[k]$, $mathsfbPTFC0[k] contains $mathsfTC0[k].
arXiv Detail & Related papers (2024-08-29T09:40:55Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ under isotropic Gaussian data. We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ of arbitrary link function with a sample and runtime complexity of $n asymp T asymp C(q) cdot d
arXiv Detail & Related papers (2024-06-03T17:56:58Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
The Approximate Degree of DNF and CNF Formulas [95.94432031144716]
For every $delta>0,$ we construct CNF and formulas of size with approximate degree $Omega(n1-delta),$ essentially matching the trivial upper bound of $n. We show that for every $delta>0$, these models require $Omega(n1-delta)$, $Omega(n/4kk2)1-delta$, and $Omega(n/4kk2)1-delta$, respectively.
arXiv Detail & Related papers (2022-09-04T10:01:39Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
Pseudo Polynomial-Time Top-k Algorithms for d-DNNF Circuits [0.0]
We present an algorithm that computes $k$ models of $C$ among those having the largest values w.r. $$. We also present a pseudo-time algorithm that transforms $C$ into a d-DNNF circuit $C'$ satisfied exactly by the models of $C$ having a value among the top-$k$ ones.
arXiv Detail & Related papers (2022-02-11T23:53:43Z)
The Fine-Grained Hardness of Sparse Linear Regression [12.83354999540079]
We show that there are no better-than-brute-force algorithms for the problem. We also show the impossibility of better-than-brute-force algorithms when the prediction error is measured.
arXiv Detail & Related papers (2021-06-06T14:19:43Z)
Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$. Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z)
Learning Polynomials of Few Relevant Dimensions [12.122488725065388]
Polynomial regression is a basic primitive in learning and statistics. We give an algorithm that learns the runtime within accuracy $epsilon$ with sample complexity that is roughly $N = O_r,d(n log2(1/epsilon) (log n)d)$ and $O_r,d(N n2)$.
arXiv Detail & Related papers (2020-04-28T18:00:18Z)
Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity [94.37110094442136]
We study the problem of agnostic $Q$-learning with function approximation in deterministic systems. We show that if $delta = Oleft(rho/sqrtdim_Eright)$, then one can find the optimal policy using $Oleft(dim_Eright)$.
arXiv Detail & Related papers (2020-02-17T18:41:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.