Beyond Black Box Densities: Parameter Learning for the Deviated
Components
- URL: http://arxiv.org/abs/2202.02651v1
- Date: Sat, 5 Feb 2022 22:44:20 GMT
- Title: Beyond Black Box Densities: Parameter Learning for the Deviated
Components
- Authors: Dat Do and Nhat Ho and XuanLong Nguyen
- Abstract summary: A known density function estimate may have been previously obtained by a black box method.
The increased complexity of the data set may result in the true density being deviated from the known estimate by a mixture distribution.
We establish rates of convergence for the maximum likelihood estimates of $lambda*$ and $G*$ under Wasserstein metric.
- Score: 15.501680326749515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As we collect additional samples from a data population for which a known
density function estimate may have been previously obtained by a black box
method, the increased complexity of the data set may result in the true density
being deviated from the known estimate by a mixture distribution. To model this
phenomenon, we consider the \emph{deviating mixture model} $(1-\lambda^{*})h_0
+ \lambda^{*} (\sum_{i = 1}^{k} p_{i}^{*} f(x|\theta_{i}^{*}))$, where $h_0$ is
a known density function, while the deviated proportion $\lambda^{*}$ and
latent mixing measure $G_{*} = \sum_{i = 1}^{k} p_{i}^{*}
\delta_{\theta_i^{*}}$ associated with the mixture distribution are unknown.
Via a novel notion of distinguishability between the known density $h_{0}$ and
the deviated mixture distribution, we establish rates of convergence for the
maximum likelihood estimates of $\lambda^{*}$ and $G^{*}$ under Wasserstein
metric. Simulation studies are carried out to illustrate the theory.
Related papers
- Dimension-free Private Mean Estimation for Anisotropic Distributions [55.86374912608193]
Previous private estimators on distributions over $mathRd suffer from a curse of dimensionality.
We present an algorithm whose sample complexity has improved dependence on dimension.
arXiv Detail & Related papers (2024-11-01T17:59:53Z) - On Parameter Estimation in Deviated Gaussian Mixture of Experts [37.439768024583955]
We consider the parameter estimation problem in the deviated Gaussian mixture of experts.
Data are generated from $g_0(Y|X)$ (null hypothesis) or they are generated from the whole mixture.
We construct novel Voronoi-based loss functions to capture the convergence rates of maximum likelihood estimation.
arXiv Detail & Related papers (2024-02-07T19:52:35Z) - Universality laws for Gaussian mixtures in generalized linear models [22.154969876570238]
We investigate the joint statistics of the family of generalized linear estimators $(Theta_1, dots, Theta_M)$.
This allow us to prove the universality of different quantities of interest, such as the training and generalization errors.
We discuss the applications of our results to different machine learning tasks of interest, such as ensembling and uncertainty.
arXiv Detail & Related papers (2023-02-17T15:16:06Z) - Consistent Density Estimation Under Discrete Mixture Models [20.935152220339056]
This work considers a problem of estimating a mixing probability density $f$ in the setting of discrete mixture models.
In particular, it is shown that there exists an estimator $f_n$ such that for every density $f$ $lim_nto infty mathbbE left[ int |f_n -f | right]=0$.
arXiv Detail & Related papers (2021-05-03T18:30:02Z) - Rates of convergence for density estimation with generative adversarial
networks [19.71040653379663]
We prove an oracle inequality for the Jensen-Shannon (JS) divergence between the underlying density $mathsfp*$ and the GAN estimate.
We show that the JS-divergence between the GAN estimate and $mathsfp*$ decays as fast as $(logn/n)2beta/ (2beta + d)$.
arXiv Detail & Related papers (2021-01-30T09:59:14Z) - The Sample Complexity of Robust Covariance Testing [56.98280399449707]
We are given i.i.d. samples from a distribution of the form $Z = (1-epsilon) X + epsilon B$, where $X$ is a zero-mean and unknown covariance Gaussian $mathcalN(0, Sigma)$.
In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples.
We prove a sample complexity lower bound of $Omega(d2)$ for $epsilon$ an arbitrarily small constant and $gamma
arXiv Detail & Related papers (2020-12-31T18:24:41Z) - Analysis of KNN Density Estimation [56.29748742084386]
kNN density estimation is minimax optimal under both $ell_infty$ and $ell_infty$ criteria, if the support set is known.
The $ell_infty$ error does not reach the minimax lower bound, but is better than kernel density estimation.
arXiv Detail & Related papers (2020-09-30T03:33:17Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z) - Robustly Learning any Clusterable Mixture of Gaussians [55.41573600814391]
We study the efficient learnability of high-dimensional Gaussian mixtures in the adversarial-robust setting.
We provide an algorithm that learns the components of an $epsilon$-corrupted $k$-mixture within information theoretically near-optimal error proofs of $tildeO(epsilon)$.
Our main technical contribution is a new robust identifiability proof clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system.
arXiv Detail & Related papers (2020-05-13T16:44:12Z) - Optimal estimation of high-dimensional location Gaussian mixtures [6.947889260024788]
We show that the minimax rate of estimating the mixing distribution in Wasserstein distance is $Theta((d/n)1/4 + n-1/(4k-2))$.
We also show that the mixture density can be estimated at the optimal parametric rate $Theta(sqrtd/n)$ in Hellinger distance.
arXiv Detail & Related papers (2020-02-14T00:11:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.