Robust estimation via generalized quasi-gradients
- URL: http://arxiv.org/abs/2005.14073v1
- Date: Thu, 28 May 2020 15:14:33 GMT
- Title: Robust estimation via generalized quasi-gradients
- Authors: Banghua Zhu, Jiantao Jiao and Jacob Steinhardt
- Abstract summary: We show why many recently proposed robust estimation problems are efficiently solvable.
We identify the existence of "generalized quasi-gradients"
We show that generalized quasi-gradients exist and construct efficient algorithms.
- Score: 28.292300073453877
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We explore why many recently proposed robust estimation problems are
efficiently solvable, even though the underlying optimization problems are
non-convex. We study the loss landscape of these robust estimation problems,
and identify the existence of "generalized quasi-gradients". Whenever these
quasi-gradients exist, a large family of low-regret algorithms are guaranteed
to approximate the global minimum; this includes the commonly-used filtering
algorithm.
For robust mean estimation of distributions under bounded covariance, we show
that any first-order stationary point of the associated optimization problem is
an {approximate global minimum} if and only if the corruption level $\epsilon <
1/3$. Consequently, any optimization algorithm that aproaches a stationary
point yields an efficient robust estimator with breakdown point $1/3$. With
careful initialization and step size, we improve this to $1/2$, which is
optimal.
For other tasks, including linear regression and joint mean and covariance
estimation, the loss landscape is more rugged: there are stationary points
arbitrarily far from the global minimum. Nevertheless, we show that generalized
quasi-gradients exist and construct efficient algorithms. These algorithms are
simpler than previous ones in the literature, and for linear regression we
improve the estimation error from $O(\sqrt{\epsilon})$ to the optimal rate of
$O(\epsilon)$ for small $\epsilon$ assuming certified hypercontractivity. For
mean estimation with near-identity covariance, we show that a simple gradient
descent algorithm achieves breakdown point $1/3$ and iteration complexity
$\tilde{O}(d/\epsilon^2)$.
Related papers
- Accelerated Stochastic Min-Max Optimization Based on Bias-corrected Momentum [30.01198677588252]
First-order algorithms require at least $mathcalO(varepsilonepsilon-4)$ complexity to find an $varepsilon-stationary point.
We introduce novel momentum algorithms utilizing efficient variable complexity.
The effectiveness of the method is validated through robust logistic regression using real-world datasets.
arXiv Detail & Related papers (2024-06-18T20:14:52Z) - Robust Sparse Estimation for Gaussians with Optimal Error under Huber Contamination [42.526664955704746]
We study sparse estimation tasks in Huber's contamination model with a focus on mean estimation, PCA, and linear regression.
For each of these tasks, we give the first sample and computationally efficient robust estimators with optimal error guarantees.
At the technical level, we develop a novel multidimensional filtering method in the sparse regime that may find other applications.
arXiv Detail & Related papers (2024-03-15T15:51:27Z) - Noise misleads rotation invariant algorithms on sparse targets [0.0]
We show that the class of rotation invariant algorithms are suboptimal for learning sparse linear problems.
We prove this via a lower bound for the Bayes optimal algorithm on a rotationally symmetrized problem.
We then prove much lower upper bounds on the same problem for simple non-rotation invariant algorithms.
arXiv Detail & Related papers (2024-03-05T06:25:19Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model.
The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z) - Correcting Momentum with Second-order Information [50.992629498861724]
We develop a new algorithm for non-critical optimization that finds an $O(epsilon)$epsilon point in the optimal product.
We validate our results on a variety of large-scale deep learning benchmarks and architectures.
arXiv Detail & Related papers (2021-03-04T19:01:20Z) - A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm
for Bilevel Optimization [112.59170319105971]
We propose a new algorithm -- the Momentum- Single-timescale Approximation (MSTSA) -- for tackling problems.
MSTSA allows us to control the error in iterations due to inaccurate solution to the lower level subproblem.
arXiv Detail & Related papers (2021-02-15T07:10:33Z) - A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis
and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure.
We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem.
We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.