Learning Gaussian Mixtures Using the Wasserstein-Fisher-Rao Gradient
Flow
- URL: http://arxiv.org/abs/2301.01766v1
- Date: Wed, 4 Jan 2023 18:59:35 GMT
- Title: Learning Gaussian Mixtures Using the Wasserstein-Fisher-Rao Gradient
Flow
- Authors: Yuling Yan, Kaizheng Wang, Philippe Rigollet
- Abstract summary: We propose a new algorithm to compute the nonparametric maximum likelihood estimator (NPMLE) in a Gaussian mixture model.
Our method is based on gradient descent over the space of probability measures equipped with the Wasserstein-Fisher-Rao geometry.
We conduct extensive numerical experiments to confirm the effectiveness of the proposed algorithm.
- Score: 12.455057637445174
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gaussian mixture models form a flexible and expressive parametric family of
distributions that has found applications in a wide variety of applications.
Unfortunately, fitting these models to data is a notoriously hard problem from
a computational perspective. Currently, only moment-based methods enjoy
theoretical guarantees while likelihood-based methods are dominated by
heuristics such as Expectation-Maximization that are known to fail in simple
examples. In this work, we propose a new algorithm to compute the nonparametric
maximum likelihood estimator (NPMLE) in a Gaussian mixture model. Our method is
based on gradient descent over the space of probability measures equipped with
the Wasserstein-Fisher-Rao geometry for which we establish convergence
guarantees. In practice, it can be approximated using an interacting particle
system where the weight and location of particles are updated alternately. We
conduct extensive numerical experiments to confirm the effectiveness of the
proposed algorithm compared not only to classical benchmarks but also to
similar gradient descent algorithms with respect to simpler geometries. In
particular, these simulations illustrate the benefit of updating both weight
and location of the interacting particles.
Related papers
- Gaussian Processes Sampling with Sparse Grids under Additive Schwarz Preconditioner [6.408773096179187]
We propose a scalable algorithm for sampling random realizations of the prior and posterior of GP models.
The proposed algorithm leverages inducing points approximation with sparse grids, as well as additive Schwarz preconditioners.
arXiv Detail & Related papers (2024-08-01T00:19:36Z) - A Fourier Approach to the Parameter Estimation Problem for One-dimensional Gaussian Mixture Models [21.436254507839738]
We propose a novel algorithm for estimating parameters in one-dimensional Gaussian mixture models.
We show that our algorithm achieves better scores in likelihood, AIC, and BIC when compared to the EM algorithm.
arXiv Detail & Related papers (2024-04-19T03:53:50Z) - Closed-form Filtering for Non-linear Systems [83.91296397912218]
We propose a new class of filters based on Gaussian PSD Models, which offer several advantages in terms of density approximation and computational efficiency.
We show that filtering can be efficiently performed in closed form when transitions and observations are Gaussian PSD Models.
Our proposed estimator enjoys strong theoretical guarantees, with estimation error that depends on the quality of the approximation and is adaptive to the regularity of the transition probabilities.
arXiv Detail & Related papers (2024-02-15T08:51:49Z) - Learning Mixtures of Gaussians Using the DDPM Objective [11.086440815804226]
We prove that gradient descent on the denoising diffusion probabilistic model (DDPM) objective can efficiently recover the ground truth parameters of the mixture model.
A key ingredient in our proofs is a new connection between score-based methods and two other approaches to distribution learning.
arXiv Detail & Related papers (2023-07-03T17:44:22Z) - Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood
Estimation for Latent Gaussian Models [69.22568644711113]
We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversions.
Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation.
In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.
arXiv Detail & Related papers (2023-06-05T21:08:34Z) - Monte Carlo Neural PDE Solver for Learning PDEs via Probabilistic Representation [59.45669299295436]
We propose a Monte Carlo PDE solver for training unsupervised neural solvers.
We use the PDEs' probabilistic representation, which regards macroscopic phenomena as ensembles of random particles.
Our experiments on convection-diffusion, Allen-Cahn, and Navier-Stokes equations demonstrate significant improvements in accuracy and efficiency.
arXiv Detail & Related papers (2023-02-10T08:05:19Z) - Gaussian process regression and conditional Karhunen-Lo\'{e}ve models
for data assimilation in inverse problems [68.8204255655161]
We present a model inversion algorithm, CKLEMAP, for data assimilation and parameter estimation in partial differential equation models.
The CKLEMAP method provides better scalability compared to the standard MAP method.
arXiv Detail & Related papers (2023-01-26T18:14:12Z) - Sampling with Mollified Interaction Energy Descent [57.00583139477843]
We present a new optimization-based method for sampling called mollified interaction energy descent (MIED)
MIED minimizes a new class of energies on probability measures called mollified interaction energies (MIEs)
We show experimentally that for unconstrained sampling problems our algorithm performs on par with existing particle-based algorithms like SVGD.
arXiv Detail & Related papers (2022-10-24T16:54:18Z) - Extracting Stochastic Governing Laws by Nonlocal Kramers-Moyal Formulas [3.8325907381729496]
We propose a data-driven approach to extract governing laws with both (Gaussian) Brownian motion and (non-Gaussian) L'evy motion.
We demonstrate that this approach can learn the differential equation with L'evy motion.
arXiv Detail & Related papers (2021-08-28T04:56:51Z) - Large-Scale Wasserstein Gradient Flows [84.73670288608025]
We introduce a scalable scheme to approximate Wasserstein gradient flows.
Our approach relies on input neural networks (ICNNs) to discretize the JKO steps.
As a result, we can sample from the measure at each step of the gradient diffusion and compute its density.
arXiv Detail & Related papers (2021-06-01T19:21:48Z) - Mean-Field Approximation to Gaussian-Softmax Integral with Application
to Uncertainty Estimation [23.38076756988258]
We propose a new single-model based approach to quantify uncertainty in deep neural networks.
We use a mean-field approximation formula to compute an analytically intractable integral.
Empirically, the proposed approach performs competitively when compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-06-13T07:32:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.