Multi-level Monte-Carlo Gradient Methods for Stochastic Optimization with Biased Oracles
- URL: http://arxiv.org/abs/2408.11084v1
- Date: Tue, 20 Aug 2024 17:56:16 GMT
- Title: Multi-level Monte-Carlo Gradient Methods for Stochastic Optimization with Biased Oracles
- Authors: Yifan Hu, Jie Wang, Xin Chen, Niao He,
- Abstract summary: We consider optimization when one only has to access biased oracles and obtain objective with low biases.
We show that biased gradient methods can reduce variance in the non-varied regime.
We also show that conditional optimization methods significantly improve best-known complexities in the literature for conditional optimization and risk optimization.
- Score: 23.648702140754967
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We consider stochastic optimization when one only has access to biased stochastic oracles of the objective and the gradient, and obtaining stochastic gradients with low biases comes at high costs. This setting captures various optimization paradigms, such as conditional stochastic optimization, distributionally robust optimization, shortfall risk optimization, and machine learning paradigms, such as contrastive learning. We examine a family of multi-level Monte Carlo (MLMC) gradient methods that exploit a delicate tradeoff among bias, variance, and oracle cost. We systematically study their total sample and computational complexities for strongly convex, convex, and nonconvex objectives and demonstrate their superiority over the widely used biased stochastic gradient method. When combined with the variance reduction techniques like SPIDER, these MLMC gradient methods can further reduce the complexity in the nonconvex regime. Our results imply that a series of stochastic optimization problems with biased oracles, previously considered to be more challenging, is fundamentally no harder than the classical stochastic optimization with unbiased oracles. We also delineate the boundary conditions under which these problems become more difficult. Moreover, MLMC gradient methods significantly improve the best-known complexities in the literature for conditional stochastic optimization and shortfall risk optimization. Our extensive numerical experiments on distributionally robust optimization, pricing and staffing scheduling problems, and contrastive learning demonstrate the superior performance of MLMC gradient methods.
Related papers
- A Stochastic Approach to Bi-Level Optimization for Hyperparameter Optimization and Meta Learning [74.80956524812714]
We tackle the general differentiable meta learning problem that is ubiquitous in modern deep learning.
These problems are often formalized as Bi-Level optimizations (BLO)
We introduce a novel perspective by turning a given BLO problem into a ii optimization, where the inner loss function becomes a smooth distribution, and the outer loss becomes an expected loss over the inner distribution.
arXiv Detail & Related papers (2024-10-14T12:10:06Z) - SDEs for Minimax Optimization [11.290653315174382]
In this paper, we pioneer the use of differential equations (SDEs) to analyze and compare Minimax convergences.
Our SDE models for Gradient Descent-Ascent, Extragradient, and Hamiltonian Gradient Descent are provable approximations of their algorithmic counterparts.
This perspective also allows for a unified and simplified analysis strategy based on the principles of Ito calculus.
arXiv Detail & Related papers (2024-02-19T20:18:29Z) - On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization [10.36447258513813]
We consider a regularized expected reward optimization problem in the non-oblivious setting that covers many existing problems in reinforcement learning (RL)
In particular, the method has shown to admit an $O(epsilon-4)$ sample to an $epsilon$-stationary point, under standard conditions.
Our analysis shows that the sample complexity can be improved from $O(epsilon-4)$ to $O(epsilon-3)$ under additional conditions.
arXiv Detail & Related papers (2024-01-23T06:01:29Z) - Contextual Stochastic Bilevel Optimization [50.36775806399861]
We introduce contextual bilevel optimization (CSBO) -- a bilevel optimization framework with the lower-level problem minimizing an expectation on some contextual information and the upper-level variable.
It is important for applications such as meta-learning, personalized learning, end-to-end learning, and Wasserstein distributionally robustly optimization with side information (WDRO-SI)
arXiv Detail & Related papers (2023-10-27T23:24:37Z) - Optimal Guarantees for Algorithmic Reproducibility and Gradient
Complexity in Convex Optimization [55.115992622028685]
Previous work suggests that first-order methods would need to trade-off convergence rate (gradient convergence rate) for better.
We demonstrate that both optimal complexity and near-optimal convergence guarantees can be achieved for smooth convex minimization and smooth convex-concave minimax problems.
arXiv Detail & Related papers (2023-10-26T19:56:52Z) - Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML.
This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z) - Backward error analysis and the qualitative behaviour of stochastic
optimization algorithms: Application to stochastic coordinate descent [1.534667887016089]
We propose a class of differential equations that approximate the dynamics of general optimization methods more closely than the original gradient flow.
We study the stability of the modified equation in the case of coordinate descent.
arXiv Detail & Related papers (2023-09-05T09:39:56Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - An adaptive stochastic gradient-free approach for high-dimensional
blackbox optimization [0.0]
We propose an adaptive gradient-free (ASGF) approach for high-dimensional non-smoothing problems.
We illustrate the performance of this method on benchmark global problems and learning tasks.
arXiv Detail & Related papers (2020-06-18T22:47:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.