Learning Polynomial Transformations
- URL: http://arxiv.org/abs/2204.04209v1
- Date: Fri, 8 Apr 2022 17:59:31 GMT
- Title: Learning Polynomial Transformations
- Authors: Sitan Chen, Jerry Li, Yuanzhi Li, Anru R. Zhang
- Abstract summary: We consider the problem of learning high dimensional quadratic transformations of Gaussians.
Our results extend to any-invariant input distribution, not just Gaussian.
We also give the first decomposition-time algorithms with provable guarantees for tensor ring decomposition.
- Score: 41.30404402331856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of learning high dimensional polynomial
transformations of Gaussians. Given samples of the form $p(x)$, where $x\sim
N(0, \mathrm{Id}_r)$ is hidden and $p: \mathbb{R}^r \to \mathbb{R}^d$ is a
function where every output coordinate is a low-degree polynomial, the goal is
to learn the distribution over $p(x)$. This problem is natural in its own
right, but is also an important special case of learning deep generative
models, namely pushforwards of Gaussians under two-layer neural networks with
polynomial activations. Understanding the learnability of such generative
models is crucial to understanding why they perform so well in practice.
Our first main result is a polynomial-time algorithm for learning quadratic
transformations of Gaussians in a smoothed setting. Our second main result is a
polynomial-time algorithm for learning constant-degree polynomial
transformations of Gaussian in a smoothed setting, when the rank of the
associated tensors is small. In fact our results extend to any
rotation-invariant input distribution, not just Gaussian. These are the first
end-to-end guarantees for learning a pushforward under a neural network with
more than one layer.
Along the way, we also give the first polynomial-time algorithms with
provable guarantees for tensor ring decomposition, a popular generalization of
tensor decomposition that is used in practice to implicitly store large
tensors.
Related papers
- Learning Mixtures of Gaussians Using Diffusion Models [9.118706387430883]
We give a new algorithm for learning mixtures of $k$ Gaussians to TV error.
Our approach is analytic and relies on the framework of diffusion models.
arXiv Detail & Related papers (2024-04-29T17:00:20Z) - Efficiently Learning One-Hidden-Layer ReLU Networks via Schur
Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss.
Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z) - Learning Narrow One-Hidden-Layer ReLU Networks [30.63086568341548]
First-time algorithm succeeds whenever $k$ is a constant.
We use a multi-scale analysis to argue that sufficiently close neurons can be collapsed together.
arXiv Detail & Related papers (2023-04-20T17:53:09Z) - Average-Case Complexity of Tensor Decomposition for Low-Degree
Polynomials [93.59919600451487]
"Statistical-computational gaps" occur in many statistical inference tasks.
We consider a model for random order-3 decomposition where one component is slightly larger in norm than the rest.
We show that tensor entries can accurately estimate the largest component when $ll n3/2$ but fail to do so when $rgg n3/2$.
arXiv Detail & Related papers (2022-11-10T00:40:37Z) - Efficient Algorithms for Learning Depth-2 Neural Networks with General
ReLU Activations [27.244958998196623]
We present time and sample efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations.
In particular, we consider learning an unknown network of the form $f(x) = amathsfTsigma(WmathsfTx+b)$, where $x$ is drawn from the Gaussian distribution, and $sigma(t) := max(t,0)$ is the ReLU activation.
arXiv Detail & Related papers (2021-07-21T17:06:03Z) - Reconstruction Algorithms for Low-Rank Tensors and Depth-3 Multilinear
Circuits [4.129484350382891]
We give new and efficient black-box reconstruction algorithms for some classes of depth$3$ arithmetic circuits.
Our algorithm works over all fields characteristic 0 or large enough characteristic.
arXiv Detail & Related papers (2021-05-04T20:45:07Z) - Small Covers for Near-Zero Sets of Polynomials and Learning Latent
Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$.
Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z) - Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network.
We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z) - Deep Polynomial Neural Networks [77.70761658507507]
$Pi$Nets are a new class of function approximators based on expansions.
$Pi$Nets produce state-the-art results in three challenging tasks, i.e. image generation, face verification and 3D mesh representation learning.
arXiv Detail & Related papers (2020-06-20T16:23:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.