Optimal Rates for Robust Stochastic Convex Optimization
- URL: http://arxiv.org/abs/2412.11003v2
- Date: Wed, 18 Dec 2024 06:17:12 GMT
- Title: Optimal Rates for Robust Stochastic Convex Optimization
- Authors: Changyu Gao, Andrew Lowy, Xingyu Zhou, Stephen J. Wright,
- Abstract summary: We develop novel algorithms that achieve minimax-optimal excess risk under the $epsilon$-contamination model.
Our algorithms do not require restrictive assumptions, and can handle nonsmooth but Lipschitz population loss functions.
- Score: 12.620782629498812
- License:
- Abstract: Machine learning algorithms in high-dimensional settings are highly susceptible to the influence of even a small fraction of structured outliers, making robust optimization techniques essential. In particular, within the $\epsilon$-contamination model, where an adversary can inspect and replace up to an $\epsilon$-fraction of the samples, a fundamental open problem is determining the optimal rates for robust stochastic convex optimization (SCO) under such contamination. We develop novel algorithms that achieve minimax-optimal excess risk (up to logarithmic factors) under the $\epsilon$-contamination model. Our approach improves over existing algorithms, which are not only suboptimal but also require stringent assumptions, including Lipschitz continuity and smoothness of individual sample functions. By contrast, our optimal algorithms do not require these restrictive assumptions, and can handle nonsmooth but Lipschitz population loss functions. We complement our algorithmic developments with a tight lower bound for robust SCO.
Related papers
- Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Parameter-Free Algorithms for Performative Regret Minimization under
Decision-Dependent Distributions [15.396561118589577]
performative risk minimization is a formulation of optimization under decision-dependent distributions.
Our algorithms significantly improve upon existing Lipschitz constant distribution parameter-based methods.
We provide experimental results that demonstrate the numerical superiority of our algorithms over the existing method and other black-box optimistic optimization methods.
arXiv Detail & Related papers (2024-02-23T08:36:28Z) - Low-Rank Extragradient Methods for Scalable Semidefinite Optimization [0.0]
We focus on high-dimensional and plausible settings in which the problem admits a low-rank solution.
We provide several theoretical results proving that, under these circumstances, the well-known Extragradient method converges to a solution of the constrained optimization problem.
arXiv Detail & Related papers (2024-02-14T10:48:00Z) - Bring Your Own Algorithm for Optimal Differentially Private Stochastic
Minimax Optimization [44.52870407321633]
holy grail of these settings is to guarantee the optimal trade-off between the privacy and the excess population loss.
We provide a general framework for solving differentially private minimax optimization (DP-SMO) problems.
Our framework is inspired from the recently proposed Phased-ERM method [20] for nonsmooth differentially private convex optimization (DP-SCO)
arXiv Detail & Related papers (2022-06-01T10:03:20Z) - Optimal Rates for Random Order Online Optimization [60.011653053877126]
We study the citetgarber 2020online, where the loss functions may be chosen by an adversary, but are then presented online in a uniformly random order.
We show that citetgarber 2020online algorithms achieve the optimal bounds and significantly improve their stability.
arXiv Detail & Related papers (2021-06-29T09:48:46Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters.
Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Private Stochastic Convex Optimization: Optimal Rates in Linear Time [74.47681868973598]
We study the problem of minimizing the population loss given i.i.d. samples from a distribution over convex loss functions.
A recent work of Bassily et al. has established the optimal bound on the excess population loss achievable given $n$ samples.
We describe two new techniques for deriving convex optimization algorithms both achieving the optimal bound on excess loss and using $O(minn, n2/d)$ gradient computations.
arXiv Detail & Related papers (2020-05-10T19:52:03Z) - Stochastic Proximal Gradient Algorithm with Minibatches. Application to
Large Scale Learning Models [2.384873896423002]
We develop and analyze minibatch variants of gradient algorithm for general composite objective functions with nonsmooth components.
We provide complexity for constant and variable stepsize iteration policies obtaining that, for minibatch size $N$, after $mathcalO(frac1Nepsilon)$ $epsilon-$subity is attained in expected quadratic distance to optimal solution.
arXiv Detail & Related papers (2020-03-30T10:43:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.