Solving Stochastic Compositional Optimization is Nearly as Easy as
Solving Stochastic Optimization
- URL: http://arxiv.org/abs/2008.10847v3
- Date: Fri, 18 Jun 2021 04:56:36 GMT
- Title: Solving Stochastic Compositional Optimization is Nearly as Easy as
Solving Stochastic Optimization
- Authors: Tianyi Chen, Yuejiao Sun, Wotao Yin
- Abstract summary: This paper presents a newally Corrected Compositional gradient method (SCSC)
SCSC runs in a single-time scale with a single loop, uses a fixed batch size, and guarantees to converge at the same rate as the gradient descent (SGD) method for non-compositional optimization.
- Score: 47.93365664380274
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic compositional optimization generalizes classic (non-compositional)
stochastic optimization to the minimization of compositions of functions. Each
composition may introduce an additional expectation. The series of expectations
may be nested. Stochastic compositional optimization is gaining popularity in
applications such as reinforcement learning and meta learning. This paper
presents a new Stochastically Corrected Stochastic Compositional gradient
method (SCSC). SCSC runs in a single-time scale with a single loop, uses a
fixed batch size, and guarantees to converge at the same rate as the stochastic
gradient descent (SGD) method for non-compositional stochastic optimization.
This is achieved by making a careful improvement to a popular stochastic
compositional gradient method. It is easy to apply SGD-improvement techniques
to accelerate SCSC. This helps SCSC achieve state-of-the-art performance for
stochastic compositional optimization. In particular, we apply Adam to SCSC,
and the exhibited rate of convergence matches that of the original Adam on
non-compositional stochastic optimization. We test SCSC using the portfolio
management and model-agnostic meta-learning tasks.
Related papers
- Using Stochastic Gradient Descent to Smooth Nonconvex Functions: Analysis of Implicit Graduated Optimization [0.6906005491572401]
We show that noise in batch descent gradient (SGD) has the effect of smoothing objective function.
We analyze a new graduated optimization algorithm that varies the degree of smoothing by learning rate and batch size.
arXiv Detail & Related papers (2023-11-15T07:27:40Z) - Federated Conditional Stochastic Optimization [110.513884892319]
Conditional optimization has found in a wide range of machine learning tasks, such as in-variant learning tasks, AUPRC, andAML.
This paper proposes algorithms for distributed federated learning.
arXiv Detail & Related papers (2023-10-04T01:47:37Z) - Backward error analysis and the qualitative behaviour of stochastic
optimization algorithms: Application to stochastic coordinate descent [1.534667887016089]
We propose a class of differential equations that approximate the dynamics of general optimization methods more closely than the original gradient flow.
We study the stability of the modified equation in the case of coordinate descent.
arXiv Detail & Related papers (2023-09-05T09:39:56Z) - Stability and Generalization of Stochastic Compositional Gradient
Descent Algorithms [61.59448949684493]
We provide the stability and generalization analysis of compositional descent algorithms built from training examples.
We establish the uniform stability results for two popular compositional gradient descent algorithms, namely SCGD and SCSC.
We derive-independent excess risk bounds for SCGD and SCSC by trade-offing their stability results and optimization errors.
arXiv Detail & Related papers (2023-07-07T02:40:09Z) - Riemannian Stochastic Gradient Method for Nested Composition Optimization [0.0]
This work considers optimization of composition of functions in a nested form over Riemannian where each function contains an expectation.
This type of problems is gaining popularity in applications such as policy evaluation in reinforcement learning or model customization in metalearning.
arXiv Detail & Related papers (2022-07-19T15:58:27Z) - Stochastic Learning Rate Optimization in the Stochastic Approximation
and Online Learning Settings [0.0]
In this work, multiplicativeity is applied to the learning rate of optimization algorithms, giving rise to learning-rate schemes.
In this work, theoretical convergence results of Gradient Descent equipped with this novel learning rate scheme are presented.
arXiv Detail & Related papers (2021-10-20T18:10:03Z) - On Stochastic Moving-Average Estimators for Non-Convex Optimization [105.22760323075008]
In this paper, we demonstrate the power of a widely used estimator based on moving average (SEMA) problems.
For all these-the-art results, we also present the results for all these-the-art problems.
arXiv Detail & Related papers (2021-04-30T08:50:24Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Stochastic Optimization with Heavy-Tailed Noise via Accelerated Gradient
Clipping [69.9674326582747]
We propose a new accelerated first-order method called clipped-SSTM for smooth convex optimization with heavy-tailed distributed noise in gradients.
We prove new complexity that outperform state-of-the-art results in this case.
We derive the first non-trivial high-probability complexity bounds for SGD with clipping without light-tails assumption on the noise.
arXiv Detail & Related papers (2020-05-21T17:05:27Z) - Stochastic Recursive Variance Reduction for Efficient Smooth Non-Convex
Compositional Optimization [20.410012564838933]
compositional optimization arises in many important machine learning tasks such as value function evaluation in reinforcement learning and portfolio management.
In this paper, we investigate the general compositional optimization in the general smooth non-cursive setting.
arXiv Detail & Related papers (2019-12-31T18:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.