Tensor Moments of Gaussian Mixture Models: Theory and Applications
- URL: http://arxiv.org/abs/2202.06930v1
- Date: Mon, 14 Feb 2022 18:42:11 GMT
- Title: Tensor Moments of Gaussian Mixture Models: Theory and Applications
- Authors: Jo\~ao M. Pereira and Joe Kileel and Tamara G. Kolda
- Abstract summary: We develop theory and numerical methods for implicit computations with moment tensors of GMMs.
We show it is possible to debias the data observations, in which case the problem of unknown means reduces to symmetric tensor decomposition.
- Score: 1.6114012813668934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gaussian mixture models (GMM) are fundamental tools in statistical and data
sciences. We study the moments of multivariate Gaussians and GMMs. The $d$-th
moment of an $n$-dimensional random variable is a symmetric $d$-way tensor of
size $n^d$, so working with moments naively is assumed to be prohibitively
expensive for $d>2$ and larger values of $n$. In this work, we develop theory
and numerical methods for implicit computations with moment tensors of GMMs,
reducing the computational and storage costs to $\mathcal{O}(n^2)$ and
$\mathcal{O}(n^3)$, respectively, for general covariance matrices, and to
$\mathcal{O}(n)$ and $\mathcal{O}(n)$, respectively, for diagonal ones. We
derive concise analytic expressions for the moments in terms of symmetrized
tensor products, relying on the correspondence between symmetric tensors and
homogeneous polynomials, and combinatorial identities involving Bell
polynomials. The primary application of this theory is to estimating GMM
parameters from a set of observations, when formulated as a moment-matching
optimization problem. If there is a known and common covariance matrix, we also
show it is possible to debias the data observations, in which case the problem
of estimating the unknown means reduces to symmetric CP tensor decomposition.
Numerical results validate and illustrate the numerical efficiency of our
approaches. This work potentially opens the door to the competitiveness of the
method of moments as compared to expectation maximization methods for parameter
estimation of GMMs.
Related papers
- Sublinear Variational Optimization of Gaussian Mixture Models with Millions to Billions of Parameters [5.429282997550318]
We train GMMs with over 10 billion parameters on about 100 million images, and observe training times of approximately nine hours on a single state-of-the-art CPU.
Our proposed algorithm significantly reduces runtime complexity per iteration from $mathcalO(NCD2)$ to a complexity scaling linearly with $D$ and remaining constant w.r.t.
As a proof of concept, we train GMMs with over 10 billion parameters on about 100 million images, and observe training times of approximately nine hours on a single state-of-the-art CPU.
arXiv Detail & Related papers (2025-01-21T17:11:25Z) - Sparse Gaussian Graphical Models with Discrete Optimization:
Computational and Statistical Perspectives [8.403841349300103]
We consider the problem of learning a sparse graph underlying an undirected Gaussian graphical model.
We propose GraphL0BnB, a new estimator based on an $ell_0$-penalized version of the pseudolikelihood function.
Our numerical experiments on real/synthetic datasets suggest that our method can solve, to near-optimality, problem instances with $p = 104$.
arXiv Detail & Related papers (2023-07-18T15:49:02Z) - Effective Minkowski Dimension of Deep Nonparametric Regression: Function
Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures.
This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z) - Deep Learning Symmetries and Their Lie Groups, Algebras, and Subalgebras
from First Principles [55.41644538483948]
We design a deep-learning algorithm for the discovery and identification of the continuous group of symmetries present in a labeled dataset.
We use fully connected neural networks to model the transformations symmetry and the corresponding generators.
Our study also opens the door for using a machine learning approach in the mathematical study of Lie groups and their properties.
arXiv Detail & Related papers (2023-01-13T16:25:25Z) - Average-Case Complexity of Tensor Decomposition for Low-Degree
Polynomials [93.59919600451487]
"Statistical-computational gaps" occur in many statistical inference tasks.
We consider a model for random order-3 decomposition where one component is slightly larger in norm than the rest.
We show that tensor entries can accurately estimate the largest component when $ll n3/2$ but fail to do so when $rgg n3/2$.
arXiv Detail & Related papers (2022-11-10T00:40:37Z) - When Random Tensors meet Random Matrices [50.568841545067144]
This paper studies asymmetric order-$d$ spiked tensor models with Gaussian noise.
We show that the analysis of the considered model boils down to the analysis of an equivalent spiked symmetric textitblock-wise random matrix.
arXiv Detail & Related papers (2021-12-23T04:05:01Z) - Multimeasurement Generative Models [7.502947376736449]
We map the problem of sampling from an unknown distribution with density $p_X$ in $mathbbRd$ to the problem of learning and sampling $p_mathbfY$ in $mathbbRMd$ obtained by convolving $p_X$ with a fixed factorial kernel.
arXiv Detail & Related papers (2021-12-18T02:11:36Z) - Convergence of Sparse Variational Inference in Gaussian Processes
Regression [29.636483122130027]
We show that a method with an overall computational cost of $mathcalO(log N)2D(loglog N)2)$ can be used to perform inference.
arXiv Detail & Related papers (2020-08-01T19:23:34Z) - Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$
We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z) - Variational Orthogonal Features [29.636483122130027]
We show that for certain priors, features can be defined that remove the $mathcalO(M3)$ cost of computing a minibatch estimate of an evidence lower bound (ELBO)
We present a construction of features for any stationary prior kernel that allow for computation of an unbiased estimator to the ELBO using $T$ Monte Carlo samples in $mathcalO(tildeNT+M2T)$ and in $mathcalO(tildeNT+MT)$ with an additional approximation.
arXiv Detail & Related papers (2020-06-23T17:18:07Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.