On the Minimax Optimality of the EM Algorithm for Learning Two-Component
Mixed Linear Regression
- URL: http://arxiv.org/abs/2006.02601v2
- Date: Fri, 5 Feb 2021 17:17:34 GMT
- Title: On the Minimax Optimality of the EM Algorithm for Learning Two-Component
Mixed Linear Regression
- Authors: Jeongyeol Kwon, Nhat Ho, Constantine Caramanis
- Abstract summary: We study the convergence rates of the EM algorithm for learning two-component mixed linear regression under all regimes of signal-to-noise ratio (SNR)
We show that the EM algorithm achieves minimax optimal sample complexity under all SNR regimes.
- Score: 30.09238179576628
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the convergence rates of the EM algorithm for learning two-component
mixed linear regression under all regimes of signal-to-noise ratio (SNR). We
resolve a long-standing question that many recent results have attempted to
tackle: we completely characterize the convergence behavior of EM, and show
that the EM algorithm achieves minimax optimal sample complexity under all SNR
regimes. In particular, when the SNR is sufficiently large, the EM updates
converge to the true parameter $\theta^{*}$ at the standard parametric
convergence rate $\mathcal{O}((d/n)^{1/2})$ after $\mathcal{O}(\log(n/d))$
iterations. In the regime where the SNR is above $\mathcal{O}((d/n)^{1/4})$ and
below some constant, the EM iterates converge to a $\mathcal{O}({\rm SNR}^{-1}
(d/n)^{1/2})$ neighborhood of the true parameter, when the number of iterations
is of the order $\mathcal{O}({\rm SNR}^{-2} \log(n/d))$. In the low SNR regime
where the SNR is below $\mathcal{O}((d/n)^{1/4})$, we show that EM converges to
a $\mathcal{O}((d/n)^{1/4})$ neighborhood of the true parameters, after
$\mathcal{O}((n/d)^{1/2})$ iterations. Notably, these results are achieved
under mild conditions of either random initialization or an efficiently
computable local initialization. By providing tight convergence guarantees of
the EM algorithm in middle-to-low SNR regimes, we fill the remaining gap in the
literature, and significantly, reveal that in low SNR, EM changes rate,
matching the $n^{-1/4}$ rate of the MLE, a behavior that previous work had been
unable to show.
Related papers
- Obtaining Lower Query Complexities through Lightweight Zeroth-Order Proximal Gradient Algorithms [65.42376001308064]
We propose two variance reduced ZO estimators for complex gradient problems.
We improve the state-of-the-art function complexities from $mathcalOleft(minfracdn1/2epsilon2, fracdepsilon3right)$ to $tildecalOleft(fracdepsilon2right)$.
arXiv Detail & Related papers (2024-10-03T15:04:01Z) - A Proximal Modified Quasi-Newton Method for Nonsmooth Regularized Optimization [0.7373617024876725]
Lipschitz-of-$nabla f$.
$mathcalS_k|p$.
$mathcalS_k|p$.
$nabla f$.
$mathcalS_k|p$.
arXiv Detail & Related papers (2024-09-28T18:16:32Z) - Accelerated Variance-Reduced Forward-Reflected Methods for Root-Finding Problems [8.0153031008486]
We propose a novel class of Nesterov's accelerated forward-reflected-based methods with variance reduction to solve root-finding problems.
Our algorithm is single-loop and leverages a new family of unbiased variance-reduced estimators specifically designed for root-finding problems.
arXiv Detail & Related papers (2024-06-04T15:23:29Z) - Efficient Sign-Based Optimization: Accelerating Convergence via Variance Reduction [16.82220229840038]
We introduce two novel algorithms that attain improved convergence rates of $mathcalO(d1/2T-1/2 + dn-1/2)$ and $mathcalO(d1/4T-1/4)$ respectively.
Numerical experiments across different tasks validate the effectiveness of our proposed methods.
arXiv Detail & Related papers (2024-06-01T16:38:43Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex
Optimization [14.69450310695133]
Existing analyses of signSGD rely on assuming that data are with replacement in each iteration.
We show that Sign has the same $O(log(nT)/stnT + log (nT)sqrtn/sT)$.
We back up theoretical findings through experiments on simulated real-world problems.
arXiv Detail & Related papers (2023-10-24T16:25:13Z) - Near-Optimal Bounds for Learning Gaussian Halfspaces with Random
Classification Noise [50.64137465792738]
We show that any efficient SQ algorithm for the problem requires sample complexity at least $Omega(d1/2/(maxp, epsilon)2)$.
Our lower bound suggests that this quadratic dependence on $1/epsilon$ is inherent for efficient algorithms.
arXiv Detail & Related papers (2023-07-13T18:59:28Z) - Sharper Convergence Guarantees for Asynchronous SGD for Distributed and
Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency.
In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! .
We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z) - Random quantum circuits transform local noise into global white noise [118.18170052022323]
We study the distribution over measurement outcomes of noisy random quantum circuits in the low-fidelity regime.
For local noise that is sufficiently weak and unital, correlations (measured by the linear cross-entropy benchmark) between the output distribution $p_textnoisy$ of a generic noisy circuit instance shrink exponentially.
If the noise is incoherent, the output distribution approaches the uniform distribution $p_textunif$ at precisely the same rate.
arXiv Detail & Related papers (2021-11-29T19:26:28Z) - Escaping Saddle-Points Faster under Interpolation-like Conditions [19.9471360853892]
We show that under over-parametrization several standard optimization algorithms escape saddle-points and converge to local-minimizers much faster.
We discuss the first-order oracle complexity of Perturbed Gradient Descent (PSGD) algorithm to reach an $epsilon$ localminimizer.
We next analyze Cubic-Regularized Newton (SCRN) algorithm under-like conditions, and show that the oracle complexity to reach an $epsilon$ local-minimizer under-like conditions, is $tildemathcalO (1/epsilon2.5
arXiv Detail & Related papers (2020-09-28T02:15:18Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.