Related papers: Tensor Moments of Gaussian Mixture Models: Theory and Applications

Tensor Moments of Gaussian Mixture Models: Theory and Applications

URL: http://arxiv.org/abs/2202.06930v1
Date: Mon, 14 Feb 2022 18:42:11 GMT
Title: Tensor Moments of Gaussian Mixture Models: Theory and Applications
Authors: Jo\~ao M. Pereira and Joe Kileel and Tamara G. Kolda
Abstract summary: We develop theory and numerical methods for implicit computations with moment tensors of GMMs. We show it is possible to debias the data observations, in which case the problem of unknown means reduces to symmetric tensor decomposition.
Score: 1.6114012813668934
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Gaussian mixture models (GMM) are fundamental tools in statistical and data sciences. We study the moments of multivariate Gaussians and GMMs. The $d$-th moment of an $n$-dimensional random variable is a symmetric $d$-way tensor of size $n^d$, so working with moments naively is assumed to be prohibitively expensive for $d>2$ and larger values of $n$. In this work, we develop theory and numerical methods for implicit computations with moment tensors of GMMs, reducing the computational and storage costs to $\mathcal{O}(n^2)$ and $\mathcal{O}(n^3)$, respectively, for general covariance matrices, and to $\mathcal{O}(n)$ and $\mathcal{O}(n)$, respectively, for diagonal ones. We derive concise analytic expressions for the moments in terms of symmetrized tensor products, relying on the correspondence between symmetric tensors and homogeneous polynomials, and combinatorial identities involving Bell polynomials. The primary application of this theory is to estimating GMM parameters from a set of observations, when formulated as a moment-matching optimization problem. If there is a known and common covariance matrix, we also show it is possible to debias the data observations, in which case the problem of estimating the unknown means reduces to symmetric CP tensor decomposition. Numerical results validate and illustrate the numerical efficiency of our approaches. This work potentially opens the door to the competitiveness of the method of moments as compared to expectation maximization methods for parameter estimation of GMMs.

Related papers

Sublinear Variational Optimization of Gaussian Mixture Models with Millions to Billions of Parameters [5.429282997550318]
We train GMMs with over 10 billion parameters on about 100 million images, and observe training times of approximately nine hours on a single state-of-the-art CPU. Our proposed algorithm significantly reduces runtime complexity per iteration from $mathcalO(NCD2)$ to a complexity scaling linearly with $D$ and remaining constant w.r.t. As a proof of concept, we train GMMs with over 10 billion parameters on about 100 million images, and observe training times of approximately nine hours on a single state-of-the-art CPU.
arXiv Detail & Related papers (2025-01-21T17:11:25Z)
Accurate and Scalable Stochastic Gaussian Process Regression via Learnable Coreset-based Variational Inference [8.077736581030264]
We introduce a novel inductive variational inference method for Gaussian process ($mathcalGP$) regression, by deriving a posterior over a learnable set of coresets. Unlike former free-form variational families for inference, our coreset-based variational $mathcalGP$ (CVGP) is defined in terms of the $mathcalGP$ prior and the (weighted) data likelihood.
arXiv Detail & Related papers (2023-11-02T17:22:22Z)
Sparse Gaussian Graphical Models with Discrete Optimization: Computational and Statistical Perspectives [8.403841349300103]
We consider the problem of learning a sparse graph underlying an undirected Gaussian graphical model. We propose GraphL0BnB, a new estimator based on an $ell_0$-penalized version of the pseudolikelihood function. Our numerical experiments on real/synthetic datasets suggest that our method can solve, to near-optimality, problem instances with $p = 104$.
arXiv Detail & Related papers (2023-07-18T15:49:02Z)
Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories [70.90012822736988]
Existing theories on deep nonparametric regression have shown that when the input data lie on a low-dimensional manifold, deep neural networks can adapt to intrinsic data structures. This paper introduces a relaxed assumption that input data are concentrated around a subset of $mathbbRd$ denoted by $mathcalS$, and the intrinsic dimension $mathcalS$ can be characterized by a new complexity notation -- effective Minkowski dimension.
arXiv Detail & Related papers (2023-06-26T17:13:31Z)
Deep Learning Symmetries and Their Lie Groups, Algebras, and Subalgebras from First Principles [55.41644538483948]
We design a deep-learning algorithm for the discovery and identification of the continuous group of symmetries present in a labeled dataset. We use fully connected neural networks to model the transformations symmetry and the corresponding generators. Our study also opens the door for using a machine learning approach in the mathematical study of Lie groups and their properties.
arXiv Detail & Related papers (2023-01-13T16:25:25Z)
Average-Case Complexity of Tensor Decomposition for Low-Degree Polynomials [93.59919600451487]
"Statistical-computational gaps" occur in many statistical inference tasks. We consider a model for random order-3 decomposition where one component is slightly larger in norm than the rest. We show that tensor entries can accurately estimate the largest component when $ll n3/2$ but fail to do so when $rgg n3/2$.
arXiv Detail & Related papers (2022-11-10T00:40:37Z)
When Random Tensors meet Random Matrices [50.568841545067144]
This paper studies asymmetric order-$d$ spiked tensor models with Gaussian noise. We show that the analysis of the considered model boils down to the analysis of an equivalent spiked symmetric textitblock-wise random matrix.
arXiv Detail & Related papers (2021-12-23T04:05:01Z)
Multimeasurement Generative Models [7.502947376736449]
We map the problem of sampling from an unknown distribution with density $p_X$ in $mathbbRd$ to the problem of learning and sampling $p_mathbfY$ in $mathbbRMd$ obtained by convolving $p_X$ with a fixed factorial kernel.
arXiv Detail & Related papers (2021-12-18T02:11:36Z)
Convergence of Sparse Variational Inference in Gaussian Processes Regression [29.636483122130027]
We show that a method with an overall computational cost of $mathcalO(log N)2D(loglog N)2)$ can be used to perform inference.
arXiv Detail & Related papers (2020-08-01T19:23:34Z)
Optimal Robust Linear Regression in Nearly Linear Time [97.11565882347772]
We study the problem of high-dimensional robust linear regression where a learner is given access to $n$ samples from the generative model $Y = langle X,w* rangle + epsilon$ We propose estimators for this problem under two settings: (i) $X$ is L4-L2 hypercontractive, $mathbbE [XXtop]$ has bounded condition number and $epsilon$ has bounded variance and (ii) $X$ is sub-Gaussian with identity second moment and $epsilon$ is
arXiv Detail & Related papers (2020-07-16T06:44:44Z)
Variational Orthogonal Features [29.636483122130027]
We show that for certain priors, features can be defined that remove the $mathcalO(M3)$ cost of computing a minibatch estimate of an evidence lower bound (ELBO) We present a construction of features for any stationary prior kernel that allow for computation of an unbiased estimator to the ELBO using $T$ Monte Carlo samples in $mathcalO(tildeNT+M2T)$ and in $mathcalO(tildeNT+MT)$ with an additional approximation.
arXiv Detail & Related papers (2020-06-23T17:18:07Z)
Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting. We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$. Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z)
Asymptotic errors for convex penalized linear regression beyond Gaussian matrices [23.15629681360836]
We consider the problem of learning a coefficient vector $x_0$ in $RN$ from noisy linear observations. We provide a rigorous derivation of an explicit formula for the mean squared error. We show that our predictions agree remarkably well with numerics even for very moderate sizes.
arXiv Detail & Related papers (2020-02-11T13:43:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.