Alternating Differentiation for Optimization Layers
- URL: http://arxiv.org/abs/2210.01802v2
- Date: Mon, 24 Apr 2023 06:33:10 GMT
- Title: Alternating Differentiation for Optimization Layers
- Authors: Haixiang Sun, Ye Shi, Jingya Wang, Hoang Duong Tuan, H. Vincent Poor,
and Dacheng Tao
- Abstract summary: We develop a new framework, named Alternating Differentiation (Alt-Diff), that differentiates optimization problems.
We show that Alt-Diff substantially decreases the dimensions of the Jacobian matrix especially for optimization with large-scale constraints.
We also propose to truncate Alt-Diff to further accelerate the computational speed.
- Score: 133.2668019610731
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The idea of embedding optimization problems into deep neural networks as
optimization layers to encode constraints and inductive priors has taken hold
in recent years. Most existing methods focus on implicitly differentiating
Karush-Kuhn-Tucker (KKT) conditions in a way that requires expensive
computations on the Jacobian matrix, which can be slow and memory-intensive. In
this paper, we developed a new framework, named Alternating Differentiation
(Alt-Diff), that differentiates optimization problems (here, specifically in
the form of convex optimization problems with polyhedral constraints) in a fast
and recursive way. Alt-Diff decouples the differentiation procedure into a
primal update and a dual update in an alternating way. Accordingly, Alt-Diff
substantially decreases the dimensions of the Jacobian matrix especially for
optimization with large-scale constraints and thus increases the computational
speed of implicit differentiation. We show that the gradients obtained by
Alt-Diff are consistent with those obtained by differentiating KKT conditions.
In addition, we propose to truncate Alt-Diff to further accelerate the
computational speed. Under some standard assumptions, we show that the
truncation error of gradients is upper bounded by the same order of variables'
estimation error. Therefore, Alt-Diff can be truncated to further increase
computational speed without sacrificing much accuracy. A series of
comprehensive experiments validate the superiority of Alt-Diff.
Related papers
- An Accelerated Algorithm for Stochastic Bilevel Optimization under Unbounded Smoothness [15.656614304616006]
This paper investigates a class of bilevel optimization problems where the upper-level function is non- unbounded smoothness and the lower-level problem is strongly convex.
These problems have significant applications in data learning, such as text classification using neural networks.
arXiv Detail & Related papers (2024-09-28T02:30:44Z) - Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity [50.25258834153574]
We focus on the class of (strongly) convex $(L0)$-smooth functions and derive new convergence guarantees for several existing methods.
In particular, we derive improved convergence rates for Gradient Descent with smoothnessed Gradient Clipping and for Gradient Descent with Polyak Stepsizes.
arXiv Detail & Related papers (2024-09-23T13:11:37Z) - ELRA: Exponential learning rate adaption gradient descent optimization
method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption.
The main idea of the method is to adapt the $alpha by situational awareness.
It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Proximal Subgradient Norm Minimization of ISTA and FISTA [8.261388753972234]
We show that the squared proximal subgradient norm for the class of iterative shrinkage-thresholding algorithms converges at an inverse square rate.
We also show that the squared proximal subgradient norm for the class of faster iterative shrinkage-thresholding algorithms (FISTA) is accelerated to convergence at an inverse cubic rate.
arXiv Detail & Related papers (2022-11-03T06:50:19Z) - Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with
Variance Reduction and its Application to Optimization [50.83356836818667]
gradient Langevin Dynamics is one of the most fundamental algorithms to solve non-eps optimization problems.
In this paper, we show two variants of this kind, namely the Variance Reduced Langevin Dynamics and the Recursive Gradient Langevin Dynamics.
arXiv Detail & Related papers (2022-03-30T11:39:00Z) - Implicit differentiation for fast hyperparameter selection in non-smooth
convex learning [87.60600646105696]
We study first-order methods when the inner optimization problem is convex but non-smooth.
We show that the forward-mode differentiation of proximal gradient descent and proximal coordinate descent yield sequences of Jacobians converging toward the exact Jacobian.
arXiv Detail & Related papers (2021-05-04T17:31:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.