Target-based Surrogates for Stochastic Optimization
- URL: http://arxiv.org/abs/2302.02607v2
- Date: Thu, 8 Jun 2023 17:39:05 GMT
- Title: Target-based Surrogates for Stochastic Optimization
- Authors: Jonathan Wilder Lavington, Sharan Vaswani, Reza Babanezhad, Mark
Schmidt, Nicolas Le Roux
- Abstract summary: We consider minimizing functions for which it is expensive to compute the (possibly) gradient.
Such functions are prevalent in computation reinforcement learning, imitation learning and adversarial training.
Our framework allows the use of standard optimization algorithms to construct surrogates which can be minimized efficiently.
- Score: 26.35752393302125
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider minimizing functions for which it is expensive to compute the
(possibly stochastic) gradient. Such functions are prevalent in reinforcement
learning, imitation learning and adversarial training. Our target optimization
framework uses the (expensive) gradient computation to construct surrogate
functions in a \emph{target space} (e.g. the logits output by a linear model
for classification) that can be minimized efficiently. This allows for multiple
parameter updates to the model, amortizing the cost of gradient computation. In
the full-batch setting, we prove that our surrogate is a global upper-bound on
the loss, and can be (locally) minimized using a black-box optimization
algorithm. We prove that the resulting majorization-minimization algorithm
ensures convergence to a stationary point of the loss. Next, we instantiate our
framework in the stochastic setting and propose the $SSO$ algorithm, which can
be viewed as projected stochastic gradient descent in the target space. This
connection enables us to prove theoretical guarantees for $SSO$ when minimizing
convex functions. Our framework allows the use of standard stochastic
optimization algorithms to construct surrogates which can be minimized by any
deterministic optimization method. To evaluate our framework, we consider a
suite of supervised learning and imitation learning problems. Our experiments
indicate the benefits of target optimization and the effectiveness of $SSO$.
Related papers
- Stochastic Zeroth-Order Optimization under Strongly Convexity and Lipschitz Hessian: Minimax Sample Complexity [59.75300530380427]
We consider the problem of optimizing second-order smooth and strongly convex functions where the algorithm is only accessible to noisy evaluations of the objective function it queries.
We provide the first tight characterization for the rate of the minimax simple regret by developing matching upper and lower bounds.
arXiv Detail & Related papers (2024-06-28T02:56:22Z) - Beyond Single-Model Views for Deep Learning: Optimization versus
Generalizability of Stochastic Optimization Algorithms [13.134564730161983]
This paper adopts a novel approach to deep learning optimization, focusing on gradient descent (SGD) and its variants.
We show that SGD and its variants demonstrate performance on par with flat-minimas like SAM, albeit with half the gradient evaluations.
Our study uncovers several key findings regarding the relationship between training loss and hold-out accuracy, as well as the comparable performance of SGD and noise-enabled variants.
arXiv Detail & Related papers (2024-03-01T14:55:22Z) - Efficient Minimax Optimal Global Optimization of Lipschitz Continuous
Multivariate Functions [0.0]
We show that our algorithm achieves an average regretO(LstnT-frac1n)$ for the horizon for the Lipschitz continuous functions.
arXiv Detail & Related papers (2022-06-06T06:28:38Z) - Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary.
Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z) - Byzantine-Resilient Non-Convex Stochastic Gradient Descent [61.6382287971982]
adversary-resilient distributed optimization, in which.
machines can independently compute gradients, and cooperate.
Our algorithm is based on a new concentration technique, and its sample complexity.
It is very practical: it improves upon the performance of all prior methods when no.
setting machines are present.
arXiv Detail & Related papers (2020-12-28T17:19:32Z) - Zeroth-Order Hybrid Gradient Descent: Towards A Principled Black-Box
Optimization Framework [100.36569795440889]
This work is on the iteration of zero-th-order (ZO) optimization which does not require first-order information.
We show that with a graceful design in coordinate importance sampling, the proposed ZO optimization method is efficient both in terms of complexity as well as as function query cost.
arXiv Detail & Related papers (2020-12-21T17:29:58Z) - Conservative Stochastic Optimization with Expectation Constraints [11.393603788068777]
This paper considers convex optimization problems where the objective and constraint functions involve expectations with respect to the data indices or environmental variables.
Online and efficient approaches for solving such problems have not been widely studied.
We propose a novel conservative optimization algorithm (CSOA) that achieves zero constraint violation and $Oleft(T-frac12right)$ optimality gap.
arXiv Detail & Related papers (2020-08-13T08:56:24Z) - Gradient Free Minimax Optimization: Variance Reduction and Faster
Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem.
We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z) - Global Optimization of Gaussian processes [52.77024349608834]
We propose a reduced-space formulation with trained Gaussian processes trained on few data points.
The approach also leads to significantly smaller and computationally cheaper sub solver for lower bounding.
In total, we reduce time convergence by orders of orders of the proposed method.
arXiv Detail & Related papers (2020-05-21T20:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.