Refined Convergence Rates for Maximum Likelihood Estimation under Finite
Mixture Models
- URL: http://arxiv.org/abs/2202.08786v1
- Date: Thu, 17 Feb 2022 17:46:40 GMT
- Title: Refined Convergence Rates for Maximum Likelihood Estimation under Finite
Mixture Models
- Authors: Tudor Manole, Nhat Ho
- Abstract summary: We revisit convergence rates for maximum likelihood estimation (MLE) under finite mixture models.
We show that a subset of the components of the penalized MLE typically converge significantly faster than could have been anticipated from past work.
- Score: 13.769786711365104
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We revisit convergence rates for maximum likelihood estimation (MLE) under
finite mixture models. The Wasserstein distance has become a standard loss
function for the analysis of parameter estimation in these models, due in part
to its ability to circumvent label switching and to accurately characterize the
behaviour of fitted mixture components with vanishing weights. However, the
Wasserstein metric is only able to capture the worst-case convergence rate
among the remaining fitted mixture components. We demonstrate that when the
log-likelihood function is penalized to discourage vanishing mixing weights,
stronger loss functions can be derived to resolve this shortcoming of the
Wasserstein distance. These new loss functions accurately capture the
heterogeneity in convergence rates of fitted mixture components, and we use
them to sharpen existing pointwise and uniform convergence rates in various
classes of mixture models. In particular, these results imply that a subset of
the components of the penalized MLE typically converge significantly faster
than could have been anticipated from past work. We further show that some of
these conclusions extend to the traditional MLE. Our theoretical findings are
supported by a simulation study to illustrate these improved convergence rates.
Related papers
- Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Non-negative Tensor Mixture Learning for Discrete Density Estimation [3.9633191508712398]
We present an expectation-maximization based framework for non-negative tensor decomposition.
We exploit that the closed-form solution of the many-body approximation can be used to update all parameters simultaneously in the M-step.
arXiv Detail & Related papers (2024-05-28T14:28:28Z) - A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts [28.13187489224953]
We propose a novel class of modified softmax gating functions which transform the input before delivering them to the gating functions.
As a result, the previous interaction disappears and the parameter estimation rates are significantly improved.
arXiv Detail & Related papers (2023-10-22T05:32:19Z) - Mutual Wasserstein Discrepancy Minimization for Sequential
Recommendation [82.0801585843835]
We propose a novel self-supervised learning framework based on Mutual WasserStein discrepancy minimization MStein for the sequential recommendation.
We also propose a novel contrastive learning loss based on Wasserstein Discrepancy Measurement.
arXiv Detail & Related papers (2023-01-28T13:38:48Z) - Sampling with Mollified Interaction Energy Descent [57.00583139477843]
We present a new optimization-based method for sampling called mollified interaction energy descent (MIED)
MIED minimizes a new class of energies on probability measures called mollified interaction energies (MIEs)
We show experimentally that for unconstrained sampling problems our algorithm performs on par with existing particle-based algorithms like SVGD.
arXiv Detail & Related papers (2022-10-24T16:54:18Z) - Monotonicity and Double Descent in Uncertainty Estimation with Gaussian
Processes [52.92110730286403]
It is commonly believed that the marginal likelihood should be reminiscent of cross-validation metrics and that both should deteriorate with larger input dimensions.
We prove that by tuning hyper parameters, the performance, as measured by the marginal likelihood, improves monotonically with the input dimension.
We also prove that cross-validation metrics exhibit qualitatively different behavior that is characteristic of double descent.
arXiv Detail & Related papers (2022-10-14T08:09:33Z) - Uniform Consistency in Nonparametric Mixture Models [12.382836502781258]
We study uniform consistency in nonparametric mixture models and mixed regression models.
In the case of mixed regression, we prove $L1$ convergence of the regression functions while allowing for the component regression functions to intersect arbitrarily often.
arXiv Detail & Related papers (2021-08-31T17:53:52Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Deconfounding Scores: Feature Representations for Causal Effect
Estimation with Weak Overlap [140.98628848491146]
We introduce deconfounding scores, which induce better overlap without biasing the target of estimation.
We show that deconfounding scores satisfy a zero-covariance condition that is identifiable in observed data.
In particular, we show that this technique could be an attractive alternative to standard regularizations.
arXiv Detail & Related papers (2021-04-12T18:50:11Z) - Uniform Convergence Rates for Maximum Likelihood Estimation under
Two-Component Gaussian Mixture Models [13.769786711365104]
We derive uniform convergence rates for the maximum likelihood estimator and minimax lower bounds for parameter estimation.
We assume the mixing proportions of the mixture are known and fixed, but make no separation assumption on the underlying mixture components.
arXiv Detail & Related papers (2020-06-01T04:13:48Z) - Robust Density Estimation under Besov IPM Losses [10.079698681921672]
We study minimax convergence rates of nonparametric density estimation in the Huber contamination model.
We show that a re-scaled thresholding wavelet series estimator achieves minimax optimal convergence rates under a wide variety of losses.
arXiv Detail & Related papers (2020-04-18T11:30:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.