Minimum Wasserstein Distance Estimator under Finite Location-scale
Mixtures
- URL: http://arxiv.org/abs/2107.01323v1
- Date: Sat, 3 Jul 2021 02:06:49 GMT
- Title: Minimum Wasserstein Distance Estimator under Finite Location-scale
Mixtures
- Authors: Qiong Zhang, Jiahua Chen
- Abstract summary: We show that the minimum Wasserstein distance estimator (MWDE) is consistent and derive a numerical solution under finite location-scale mixtures.
Our study shows the MWDE suffers some efficiency loss against a penalized version of MLE in general.
We reaffirm the general superiority of the likelihood based learning strategies even for the non-regular finite location-scale mixtures.
- Score: 17.662433196563473
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When a population exhibits heterogeneity, we often model it via a finite
mixture: decompose it into several different but homogeneous subpopulations.
Contemporary practice favors learning the mixtures by maximizing the likelihood
for statistical efficiency and the convenient EM-algorithm for numerical
computation. Yet the maximum likelihood estimate (MLE) is not well defined for
the most widely used finite normal mixture in particular and for finite
location-scale mixture in general. We hence investigate feasible alternatives
to MLE such as minimum distance estimators. Recently, the Wasserstein distance
has drawn increased attention in the machine learning community. It has
intuitive geometric interpretation and is successfully employed in many new
applications. Do we gain anything by learning finite location-scale mixtures
via a minimum Wasserstein distance estimator (MWDE)? This paper investigates
this possibility in several respects. We find that the MWDE is consistent and
derive a numerical solution under finite location-scale mixtures. We study its
robustness against outliers and mild model mis-specifications. Our moderate
scaled simulation study shows the MWDE suffers some efficiency loss against a
penalized version of MLE in general without noticeable gain in robustness. We
reaffirm the general superiority of the likelihood based learning strategies
even for the non-regular finite location-scale mixtures.
Related papers
- Learning large softmax mixtures with warm start EM [17.081578976570437]
Mixed multinomial logits are discrete mixtures introduced several decades ago to model the probability of choosing an attribute from $p$ possible candidates.
softmax mixtures are routinely used in the final layer of a neural network to map a large number $p$ of vectors in $mathbbRL$ to a probability vector.
arXiv Detail & Related papers (2024-09-16T00:14:48Z) - Riemannian stochastic optimization methods avoid strict saddle points [68.80251170757647]
We show that policies under study avoid strict saddle points / submanifolds with probability 1.
This result provides an important sanity check as it shows that, almost always, the limit state of an algorithm can only be a local minimizer.
arXiv Detail & Related papers (2023-11-04T11:12:24Z) - Statistical Estimation Under Distribution Shift: Wasserstein
Perturbations and Minimax Theory [24.540342159350015]
We focus on Wasserstein distribution shifts, where every data point may undergo a slight perturbation.
We consider perturbations that are either independent or coordinated joint shifts across data points.
We analyze several important statistical problems, including location estimation, linear regression, and non-parametric density estimation.
arXiv Detail & Related papers (2023-08-03T16:19:40Z) - Learning Gaussian Mixtures Using the Wasserstein-Fisher-Rao Gradient
Flow [12.455057637445174]
We propose a new algorithm to compute the nonparametric maximum likelihood estimator (NPMLE) in a Gaussian mixture model.
Our method is based on gradient descent over the space of probability measures equipped with the Wasserstein-Fisher-Rao geometry.
We conduct extensive numerical experiments to confirm the effectiveness of the proposed algorithm.
arXiv Detail & Related papers (2023-01-04T18:59:35Z) - Beyond EM Algorithm on Over-specified Two-Component Location-Scale
Gaussian Mixtures [29.26015093627193]
We develop the Exponential Location Update (ELU) algorithm to efficiently explore the curvature of the negative log-likelihood functions.
We demonstrate that the ELU algorithm converges to the final statistical radius of the models after a logarithmic number of iterations.
arXiv Detail & Related papers (2022-05-23T06:49:55Z) - Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models.
In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints.
A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - Maximum Entropy Reinforcement Learning with Mixture Policies [54.291331971813364]
We construct a tractable approximation of the mixture entropy using MaxEnt algorithms.
We show that it is closely related to the sum of marginal entropies.
We derive an algorithmic variant of Soft Actor-Critic (SAC) to the mixture policy case and evaluate it on a series of continuous control tasks.
arXiv Detail & Related papers (2021-03-18T11:23:39Z) - Continuous Wasserstein-2 Barycenter Estimation without Minimax
Optimization [94.18714844247766]
Wasserstein barycenters provide a geometric notion of the weighted average of probability measures based on optimal transport.
We present a scalable algorithm to compute Wasserstein-2 barycenters given sample access to the input measures.
arXiv Detail & Related papers (2021-02-02T21:01:13Z) - Self-regularizing Property of Nonparametric Maximum Likelihood Estimator
in Mixture Models [39.27013036481509]
We introduce the nonparametric maximum likelihood (NPMLE) model for general Gaussian mixtures.
We show that with high probability the NPMLE based on a sample size has $O(log n)$ atoms (mass points)
Notably, any mixture is statistically in from a finite one with $Olog selection.
arXiv Detail & Related papers (2020-08-19T03:39:13Z) - Learning Minimax Estimators via Online Learning [55.92459567732491]
We consider the problem of designing minimax estimators for estimating parameters of a probability distribution.
We construct an algorithm for finding a mixed-case Nash equilibrium.
arXiv Detail & Related papers (2020-06-19T22:49:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.