Related papers: Automatic Differentiation of Optimization Algorithms with Time-Varying Updates

Automatic Differentiation of Optimization Algorithms with Time-Varying Updates

URL: http://arxiv.org/abs/2410.15923v2
Date: Thu, 24 Oct 2024 18:03:31 GMT
Title: Automatic Differentiation of Optimization Algorithms with Time-Varying Updates
Authors: Sheheryar Mehmood, Peter Ochs,
Abstract summary: We apply unrolled or automatic differentiation to a time-varying iterative process and provide convergence guarantees for the resulting derivative iterates. We adapt these convergence results and apply them to proximal gradient descent with variable step size and FISTA when solving partly smooth problems. Our theoretical and numerical results show that the convergence rate of the algorithm is reflected in its derivative iterates.
Score: 4.389150156866014
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Numerous Optimization Algorithms have a time-varying update rule thanks to, for instance, a changing step size, momentum parameter or, Hessian approximation. In this paper, we apply unrolled or automatic differentiation to a time-varying iterative process and provide convergence (rate) guarantees for the resulting derivative iterates. We adapt these convergence results and apply them to proximal gradient descent with variable step size and FISTA when solving partly smooth problems. We confirm our findings numerically by solving $\ell_1$ and $\ell_2$-regularized linear and logisitc regression respectively. Our theoretical and numerical results show that the convergence rate of the algorithm is reflected in its derivative iterates.

Related papers

Pathwise optimization for bridge-type estimators and its applications [49.1574468325115]
Pathwise methods allow to efficiently compute the full path for penalized estimators. We apply these algorithms to the penalized estimation of processes observed at discrete times.
arXiv Detail & Related papers (2024-12-05T10:38:29Z)
Stochastic Optimization for Non-convex Problem with Inexact Hessian Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties. We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z)
An Accelerated Block Proximal Framework with Adaptive Momentum for Nonconvex and Nonsmooth Optimization [2.323238724742687]
We propose an accelerated block proximal linear framework with adaptive momentum (ABPL$+$) for nonsmooth and nonsmooth optimization. We analyze the potential causes of the extrapolation step failing in some algorithms, and resolve this issue by enhancing the comparison process. We extend our algorithm to any scenario involving updating the gradient step and the linear extrapolation step.
arXiv Detail & Related papers (2023-08-23T13:32:31Z)
Ordering for Non-Replacement SGD [7.11967773739707]
We seek to find an ordering that can improve the convergence rates for the non-replacement form of the algorithm. We develop optimal orderings for constant and decreasing step sizes for strongly convex and convex functions. In addition, we are able to combine the ordering with mini-batch and further apply it to more complex neural networks.
arXiv Detail & Related papers (2023-06-28T00:46:58Z)
Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms. An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z)
Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems. In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z)
Continuation Path with Linear Convergence Rate [18.405645120971496]
Path-following algorithms are frequently used in composite optimization problems where a series of subproblems are solved sequentially. We present a primal dual analysis of the path-following algorithm as well as determining how accurately each subproblem should be solved to guarantee a linear convergence rate on a target problem. Considering optimization with a sparsity-inducing penalty, we analyze the change of the active sets with respect to the regularization parameter.
arXiv Detail & Related papers (2021-12-09T18:42:13Z)
A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning [13.908826484332282]
We study a new two-time-scale gradient method for solving optimization problems. Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale gradient algorithm. We apply our framework to gradient-based policy evaluation algorithms in reinforcement learning.
arXiv Detail & Related papers (2021-09-29T23:15:23Z)
Nesterov Accelerated ADMM for Fast Diffeomorphic Image Registration [63.15453821022452]
Recent developments in approaches based on deep learning have achieved sub-second runtimes for DiffIR. We propose a simple iterative scheme that functionally composes intermediate non-stationary velocity fields. We then propose a convex optimisation model that uses a regularisation term of arbitrary order to impose smoothness on these velocity fields.
arXiv Detail & Related papers (2021-09-26T19:56:45Z)
Distributed Proximal Splitting Algorithms with Rates and Acceleration [7.691755449724637]
We derive sublinear and linear convergence results with new rates on the function value suboptimality or distance to the solution. We propose distributed variants of these algorithms, which can be accelerated as well.
arXiv Detail & Related papers (2020-10-02T12:35:09Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step. Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
Convergence of adaptive algorithms for weakly convex constrained optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.