Non-Gaussian Component Analysis via Lattice Basis Reduction
- URL: http://arxiv.org/abs/2112.09104v1
- Date: Thu, 16 Dec 2021 18:38:02 GMT
- Title: Non-Gaussian Component Analysis via Lattice Basis Reduction
- Authors: Ilias Diakonikolas and Daniel M. Kane
- Abstract summary: Non-Gaussian Component Analysis (NGCA) is a distribution learning problem.
We provide an efficient algorithm for NGCA in the regime that $A$ is discrete or nearly discrete.
- Score: 56.98280399449707
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-Gaussian Component Analysis (NGCA) is the following distribution learning
problem: Given i.i.d. samples from a distribution on $\mathbb{R}^d$ that is
non-gaussian in a hidden direction $v$ and an independent standard Gaussian in
the orthogonal directions, the goal is to approximate the hidden direction $v$.
Prior work \cite{DKS17-sq} provided formal evidence for the existence of an
information-computation tradeoff for NGCA under appropriate moment-matching
conditions on the univariate non-gaussian distribution $A$. The latter result
does not apply when the distribution $A$ is discrete. A natural question is
whether information-computation tradeoffs persist in this setting. In this
paper, we answer this question in the negative by obtaining a sample and
computationally efficient algorithm for NGCA in the regime that $A$ is discrete
or nearly discrete, in a well-defined technical sense. The key tool leveraged
in our algorithm is the LLL method \cite{LLL82} for lattice basis reduction.
Related papers
- Variable Selection in Convex Piecewise Linear Regression [5.366354612549172]
This paper presents Sparse Gradient as a solution for variable selection in convex piecewise linear regression.
A non-asymptotic local convergence analysis is provided for SpGD under subGaussian noise.
arXiv Detail & Related papers (2024-11-04T16:19:09Z) - Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - Sum-of-squares lower bounds for Non-Gaussian Component Analysis [33.80749804695003]
Non-Gaussian Component Analysis (NGCA) is the statistical task of finding a non-Gaussian direction in a high-dimensional dataset.
Here we study the complexity of NGCA in the Sum-of-Squares framework.
arXiv Detail & Related papers (2024-10-28T18:19:13Z) - SoS Certifiability of Subgaussian Distributions and its Algorithmic Applications [37.208622097149714]
We prove that there is a universal constant $C>0$ so that for every $d inmathbb N$, every centered subgaussian distribution $mathcal D$ on $mathbb Rd$, and every even $p inmathbb N$, the $d-optimal inmathbb N$, and the $d-optimal inmathbb N$.
This establishes that every subgaussian distribution is emphS-certifiably subgaussian -- a condition that yields efficient learning algorithms for a wide variety of
arXiv Detail & Related papers (2024-10-28T16:36:58Z) - Which exceptional low-dimensional projections of a Gaussian point cloud can be found in polynomial time? [8.74634652691576]
We study the subset $mathscrF_m,alpha$ of distributions that can be realized by a class of iterative algorithms.
Non-rigorous methods from statistical physics yield an indirect characterization of $mathscrF_m,alpha$ in terms of a generalized Parisi formula.
arXiv Detail & Related papers (2024-06-05T05:54:56Z) - Statistical Learning under Heterogeneous Distribution Shift [71.8393170225794]
Ground-truth predictor is additive $mathbbE[mathbfz mid mathbfx,mathbfy] = f_star(mathbfx) +g_star(mathbfy)$.
arXiv Detail & Related papers (2023-02-27T16:34:21Z) - Robust Mean Estimation Without Moments for Symmetric Distributions [7.105512316884493]
We show that for a large class of symmetric distributions, the same error as in the Gaussian setting can be achieved efficiently.
We propose a sequence of efficient algorithms that approaches this optimal error.
Our algorithms are based on a generalization of the well-known filtering technique.
arXiv Detail & Related papers (2023-02-21T17:52:23Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.