Large-Scale Methods for Distributionally Robust Optimization
- URL: http://arxiv.org/abs/2010.05893v2
- Date: Fri, 11 Dec 2020 03:05:30 GMT
- Title: Large-Scale Methods for Distributionally Robust Optimization
- Authors: Daniel Levy, Yair Carmon, John C. Duchi and Aaron Sidford
- Abstract summary: We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters.
Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
- Score: 53.98643772533416
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose and analyze algorithms for distributionally robust optimization of
convex losses with conditional value at risk (CVaR) and $\chi^2$ divergence
uncertainty sets. We prove that our algorithms require a number of gradient
evaluations independent of training set size and number of parameters, making
them suitable for large-scale applications. For $\chi^2$ uncertainty sets these
are the first such guarantees in the literature, and for CVaR our guarantees
scale linearly in the uncertainty level rather than quadratically as in
previous work. We also provide lower bounds proving the worst-case optimality
of our algorithms for CVaR and a penalized version of the $\chi^2$ problem. Our
primary technical contributions are novel bounds on the bias of batch robust
risk estimation and the variance of a multilevel Monte Carlo gradient estimator
due to [Blanchet & Glynn, 2015]. Experiments on MNIST and ImageNet confirm the
theoretical scaling of our algorithms, which are 9--36 times more efficient
than full-batch methods.
Related papers
- SOREL: A Stochastic Algorithm for Spectral Risks Minimization [1.6574413179773761]
spectral risk has wide applications in machine learning, especially in real-world decision-making.
By assigning different weights to the losses of different sample points, it allows the model's performance to lie between the average performance and the worst-case performance.
We propose SOREL, the first gradient-based algorithm with convergence guarantees for the spectral risk minimization.
arXiv Detail & Related papers (2024-07-19T18:20:53Z) - Distributionally Robust Optimization with Bias and Variance Reduction [9.341215359733601]
We show that Prospect, a gradient-based algorithm, enjoys linear convergence for smooth regularized losses.
We also show that Prospect can converge 2-3$times$ faster than baselines such as gradient-based methods.
arXiv Detail & Related papers (2023-10-21T00:03:54Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - On Private Online Convex Optimization: Optimal Algorithms in
$\ell_p$-Geometry and High Dimensional Contextual Bandits [9.798304879986604]
We study the Differentially private (DP) convex optimization problem with streaming data sampled from a distribution and arrives sequentially.
We also consider the continual release model where parameters related to private information are updated and released upon each new data, often known as the online algorithms.
Our algorithm achieves in linear time the optimal excess risk when $1pleq 2$ and the state-of-the-art excess risk meeting the non-private lower ones when $2pleqinfty$.
arXiv Detail & Related papers (2022-06-16T12:09:47Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Correcting Momentum with Second-order Information [50.992629498861724]
We develop a new algorithm for non-critical optimization that finds an $O(epsilon)$epsilon point in the optimal product.
We validate our results on a variety of large-scale deep learning benchmarks and architectures.
arXiv Detail & Related papers (2021-03-04T19:01:20Z) - Provably Convergent Working Set Algorithm for Non-Convex Regularized
Regression [0.0]
This paper proposes a working set algorithm for non-regular regularizers with convergence guarantees.
Our results demonstrate high gain compared to the full problem solver for both block-coordinates or a gradient solver.
arXiv Detail & Related papers (2020-06-24T07:40:31Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z) - Effective Dimension Adaptive Sketching Methods for Faster Regularized
Least-Squares Optimization [56.05635751529922]
We propose a new randomized algorithm for solving L2-regularized least-squares problems based on sketching.
We consider two of the most popular random embeddings, namely, Gaussian embeddings and the Subsampled Randomized Hadamard Transform (SRHT)
arXiv Detail & Related papers (2020-06-10T15:00:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.