Two Losses Are Better Than One: Faster Optimization Using a Cheaper
Proxy
- URL: http://arxiv.org/abs/2302.03542v3
- Date: Wed, 7 Jun 2023 11:51:35 GMT
- Title: Two Losses Are Better Than One: Faster Optimization Using a Cheaper
Proxy
- Authors: Blake Woodworth (SIERRA), Konstantin Mishchenko, Francis Bach (SIERRA,
PSL)
- Abstract summary: We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy.
Our algorithm guarantees convergence at a rate matching the gradient descent on a $delta$-smooth objective.
Our algorithm has many potential applications in machine learning, and provides a principled means of leveraging synthetic data, physics simulators, mixed public and private data, and more.
- Score: 6.170898159041277
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an algorithm for minimizing an objective with hard-to-compute
gradients by using a related, easier-to-access function as a proxy. Our
algorithm is based on approximate proximal point iterations on the proxy
combined with relatively few stochastic gradients from the objective. When the
difference between the objective and the proxy is $\delta$-smooth, our
algorithm guarantees convergence at a rate matching stochastic gradient descent
on a $\delta$-smooth objective, which can lead to substantially better sample
efficiency. Our algorithm has many potential applications in machine learning,
and provides a principled means of leveraging synthetic data, physics
simulators, mixed public and private data, and more.
Related papers
- Stochastic Constrained Decentralized Optimization for Machine Learning with Fewer Data Oracles: a Gradient Sliding Approach [32.36073823372713]
In machine-learning models, the algorithm has to communicate to the data center and sample data for its gradient.
This gives rise to the need for a decentralized optimization algorithm that is communication-efficient and minimizes the number of gradient computations.
We propose a primal-dual sliding with conditional gradient sliding framework, which is communication-efficient and achieves an $varepsilon$-approximate solution.
arXiv Detail & Related papers (2024-04-03T06:55:59Z) - Self-concordant Smoothing for Large-Scale Convex Composite Optimization [0.0]
We introduce a notion of self-concordant smoothing for minimizing the sum of two convex functions, one of which is smooth and the other may be nonsmooth.
We prove the convergence of two resulting algorithms: Prox-N-SCORE, a proximal Newton algorithm and Prox-GGN-SCORE, a proximal generalized Gauss-Newton algorithm.
arXiv Detail & Related papers (2023-09-04T19:47:04Z) - Federated Learning Using Variance Reduced Stochastic Gradient for
Probabilistically Activated Agents [0.0]
This paper proposes an algorithm for Federated Learning (FL) with a two-layer structure that achieves both variance reduction and a faster convergence rate to an optimal solution in the setting where each agent has an arbitrary probability of selection in each iteration.
arXiv Detail & Related papers (2022-10-25T22:04:49Z) - Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex
Decentralized Optimization Over Time-Varying Networks [79.16773494166644]
We consider the task of minimizing the sum of smooth and strongly convex functions stored in a decentralized manner across the nodes of a communication network.
We design two optimal algorithms that attain these lower bounds.
We corroborate the theoretical efficiency of these algorithms by performing an experimental comparison with existing state-of-the-art methods.
arXiv Detail & Related papers (2021-06-08T15:54:44Z) - Correcting Momentum with Second-order Information [50.992629498861724]
We develop a new algorithm for non-critical optimization that finds an $O(epsilon)$epsilon point in the optimal product.
We validate our results on a variety of large-scale deep learning benchmarks and architectures.
arXiv Detail & Related papers (2021-03-04T19:01:20Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Towards Better Understanding of Adaptive Gradient Algorithms in
Generative Adversarial Nets [71.05306664267832]
Adaptive algorithms perform gradient updates using the history of gradients and are ubiquitous in training deep neural networks.
In this paper we analyze a variant of OptimisticOA algorithm for nonconcave minmax problems.
Our experiments show that adaptive GAN non-adaptive gradient algorithms can be observed empirically.
arXiv Detail & Related papers (2019-12-26T22:10:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.