On the Role of Channel Capacity in Learning Gaussian Mixture Models
- URL: http://arxiv.org/abs/2202.07707v1
- Date: Tue, 15 Feb 2022 20:15:00 GMT
- Title: On the Role of Channel Capacity in Learning Gaussian Mixture Models
- Authors: Elad Romanov, Tamir Bendory, Or Ordentlich
- Abstract summary: We study the sample complexity of learning the $k$ unknown centers of a balanced Gaussian mixture model (GMM)
Our main results characterize the exact noise threshold $sigma2$ below which the GMM learning problem, in the large system limit $d,ktoinfty$, is as easy as learning from labeled observations.
While our results are only proved for GMMs whose centers are uniformly distributed over the sphere, they hint that perhaps it is the decoding error probability associated with the center constellation as a channel code that determines the statistical difficulty of learning the corresponding GMM.
- Score: 31.841289319809814
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the sample complexity of learning the $k$ unknown centers
of a balanced Gaussian mixture model (GMM) in $\mathbb{R}^d$ with spherical
covariance matrix $\sigma^2\mathbf{I}$. In particular, we are interested in the
following question: what is the maximal noise level $\sigma^2$, for which the
sample complexity is essentially the same as when estimating the centers from
labeled measurements? To that end, we restrict attention to a Bayesian
formulation of the problem, where the centers are uniformly distributed on the
sphere $\sqrt{d}\mathcal{S}^{d-1}$. Our main results characterize the exact
noise threshold $\sigma^2$ below which the GMM learning problem, in the large
system limit $d,k\to\infty$, is as easy as learning from labeled observations,
and above which it is substantially harder. The threshold occurs at $\frac{\log
k}{d} = \frac12\log\left( 1+\frac{1}{\sigma^2} \right)$, which is the capacity
of the additive white Gaussian noise (AWGN) channel. Thinking of the set of $k$
centers as a code, this noise threshold can be interpreted as the largest noise
level for which the error probability of the code over the AWGN channel is
small. Previous works on the GMM learning problem have identified the minimum
distance between the centers as a key parameter in determining the statistical
difficulty of learning the corresponding GMM. While our results are only proved
for GMMs whose centers are uniformly distributed over the sphere, they hint
that perhaps it is the decoding error probability associated with the center
constellation as a channel code that determines the statistical difficulty of
learning the corresponding GMM, rather than just the minimum distance.
Related papers
- Sum-of-squares lower bounds for Non-Gaussian Component Analysis [33.80749804695003]
Non-Gaussian Component Analysis (NGCA) is the statistical task of finding a non-Gaussian direction in a high-dimensional dataset.
Here we study the complexity of NGCA in the Sum-of-Squares framework.
arXiv Detail & Related papers (2024-10-28T18:19:13Z) - Better Gaussian Mechanism using Correlated Noise [1.450405446885067]
We show that adding a random variable distributed as a Gaussian with variance scaled by $(sqrtd + 1)/4$ to all counts allows us to reduce the variance of the independent noise samples to scale only with $(d + sqrtd)/4$.
The central idea of our mechanism is simple and the technique is flexible.
arXiv Detail & Related papers (2024-08-13T12:31:03Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Order-Optimal Error Bounds for Noisy Kernel-Based Bayesian Quadrature [42.129843613950285]
We consider functions in a em Reproducing Kernel Hilbert Space (RKHS) with the Mat'ern-$nu$ kernel.
We find that when the black-box queries are subject to Gaussian noise having variance $sigma2$, any algorithm making at most $T$ queries must incur a mean absolute error of $Omega(T-fracnud-1 + sigma T-frac12)$.
arXiv Detail & Related papers (2022-02-22T01:49:41Z) - Multimeasurement Generative Models [7.502947376736449]
We map the problem of sampling from an unknown distribution with density $p_X$ in $mathbbRd$ to the problem of learning and sampling $p_mathbfY$ in $mathbbRMd$ obtained by convolving $p_X$ with a fixed factorial kernel.
arXiv Detail & Related papers (2021-12-18T02:11:36Z) - Robust Model Selection and Nearly-Proper Learning for GMMs [26.388358539260473]
In learning theory, a standard assumption is that the data is generated from a finite mixture model. But what happens when the number of components is not known in advance?
We are able to approximately determine the minimum number of components needed to fit the distribution within a logarithmic factor.
arXiv Detail & Related papers (2021-06-05T01:58:40Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Hardness of Learning Halfspaces with Massart Noise [56.98280399449707]
We study the complexity of PAC learning halfspaces in the presence of Massart (bounded) noise.
We show that there an exponential gap between the information-theoretically optimal error and the best error that can be achieved by a SQ algorithm.
arXiv Detail & Related papers (2020-12-17T16:43:11Z) - Learning Halfspaces with Tsybakov Noise [50.659479930171585]
We study the learnability of halfspaces in the presence of Tsybakov noise.
We give an algorithm that achieves misclassification error $epsilon$ with respect to the true halfspace.
arXiv Detail & Related papers (2020-06-11T14:25:02Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.