Robust High Dimensional Expectation Maximization Algorithm via Trimmed
Hard Thresholding
- URL: http://arxiv.org/abs/2010.09576v1
- Date: Mon, 19 Oct 2020 15:00:35 GMT
- Title: Robust High Dimensional Expectation Maximization Algorithm via Trimmed
Hard Thresholding
- Authors: Di Wang and Xiangyu Guo and Shi Li and Jinhui Xu
- Abstract summary: We study the problem of estimating latent variable models with arbitrarily corrupted samples in high dimensional space.
We propose a method called Trimmed (Gradient) Expectation Maximization which adds a trimming gradient step.
We show that the algorithm is corruption-proofing and converges to the (near) optimal statistical rate geometrically.
- Score: 24.184520829631587
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the problem of estimating latent variable models with
arbitrarily corrupted samples in high dimensional space ({\em i.e.,} $d\gg n$)
where the underlying parameter is assumed to be sparse. Specifically, we
propose a method called Trimmed (Gradient) Expectation Maximization which adds
a trimming gradients step and a hard thresholding step to the Expectation step
(E-step) and the Maximization step (M-step), respectively. We show that under
some mild assumptions and with an appropriate initialization, the algorithm is
corruption-proofing and converges to the (near) optimal statistical rate
geometrically when the fraction of the corrupted samples $\epsilon$ is bounded
by $ \tilde{O}(\frac{1}{\sqrt{n}})$. Moreover, we apply our general framework
to three canonical models: mixture of Gaussians, mixture of regressions and
linear regression with missing covariates. Our theory is supported by thorough
numerical results.
Related papers
- Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Convergence Analysis of Adaptive Gradient Methods under Refined Smoothness and Noise Assumptions [18.47705532817026]
Adaptive gradient methods are arguably the most successful optimization algorithms for neural network.
We show that adaptive gradient methods can potentially shave a factor Adad-ell/ell$ geometry.
arXiv Detail & Related papers (2024-06-07T02:55:57Z) - Robust Stochastic Optimization via Gradient Quantile Clipping [1.90365714903665]
We introduce a quant clipping strategy for Gradient Descent (SGD)
We use gradient new outliers as norm clipping chains.
We propose an implementation of the algorithm using Huberiles.
arXiv Detail & Related papers (2023-09-29T15:24:48Z) - Restricted Strong Convexity of Deep Learning Models with Smooth
Activations [31.003601717265006]
We study the problem of optimization of deep learning models with smooth activation functions.
We introduce a new analysis of optimization based on Restricted Strong Convexity (RSC)
Ours is the first result on establishing geometric convergence of GD based on RSC for deep learning models.
arXiv Detail & Related papers (2022-09-29T21:24:26Z) - Optimal Extragradient-Based Bilinearly-Coupled Saddle-Point Optimization [116.89941263390769]
We consider the smooth convex-concave bilinearly-coupled saddle-point problem, $min_mathbfxmax_mathbfyF(mathbfx) + H(mathbfx,mathbfy)$, where one has access to first-order oracles for $F$, $G$ as well as the bilinear coupling function $H$.
We present a emphaccelerated gradient-extragradient (AG-EG) descent-ascent algorithm that combines extragrad
arXiv Detail & Related papers (2022-06-17T06:10:20Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples.
We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z) - On Stochastic Moving-Average Estimators for Non-Convex Optimization [105.22760323075008]
In this paper, we demonstrate the power of a widely used estimator based on moving average (SEMA) problems.
For all these-the-art results, we also present the results for all these-the-art problems.
arXiv Detail & Related papers (2021-04-30T08:50:24Z) - Max-Linear Regression by Convex Programming [5.366354612549172]
We formulate and analyze a scalable convex program given by anchored regression (AR) as the estimator for the max-linear regression problem.
Our result shows a sufficient number of noise-free observations for exact recovery scales as $k4p$ up to a logarithmic factor.
arXiv Detail & Related papers (2021-03-12T00:55:54Z) - Last iterate convergence of SGD for Least-Squares in the Interpolation
regime [19.05750582096579]
We study the noiseless model in the fundamental least-squares setup.
We assume that an optimum predictor fits perfectly inputs and outputs $langle theta_*, phi(X) rangle = Y$, where $phi(X)$ stands for a possibly infinite dimensional non-linear feature map.
arXiv Detail & Related papers (2021-02-05T14:02:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.