Improving the Efficiency of Gradient Descent Algorithms Applied to
Optimization Problems with Dynamical Constraints
- URL: http://arxiv.org/abs/2208.12834v1
- Date: Fri, 26 Aug 2022 18:26:50 GMT
- Title: Improving the Efficiency of Gradient Descent Algorithms Applied to
Optimization Problems with Dynamical Constraints
- Authors: Ion Matei, Maksym Zhenirovskyy, Johan de Kleer and John Maxwell
- Abstract summary: We introduce two block coordinate descent algorithms for solving optimization problems with ordinary differential equations.
The algorithms do not need to implement direct or adjoint sensitivity analysis methods to evaluate loss function gradients.
- Score: 3.3722008527102894
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce two block coordinate descent algorithms for solving optimization
problems with ordinary differential equations (ODEs) as dynamical constraints.
The algorithms do not need to implement direct or adjoint sensitivity analysis
methods to evaluate loss function gradients. They results from reformulation of
the original problem as an equivalent optimization problem with equality
constraints. The algorithms naturally follow from steps aimed at recovering the
gradient-decent algorithm based on ODE solvers that explicitly account for
sensitivity of the ODE solution. In our first proposed algorithm we avoid
explicitly solving the ODE by integrating the ODE solver as a sequence of
implicit constraints. In our second algorithm, we use an ODE solver to reset
the ODE solution, but no direct are adjoint sensitivity analysis methods are
used. Both algorithm accepts mini-batch implementations and show significant
efficiency benefits from GPU-based parallelization. We demonstrate the
performance of the algorithms when applied to learning the parameters of the
Cucker-Smale model. The algorithms are compared with gradient descent
algorithms based on ODE solvers endowed with sensitivity analysis capabilities,
for various number of state size, using Pytorch and Jax implementations. The
experimental results demonstrate that the proposed algorithms are at least 4x
faster than the Pytorch implementations, and at least 16x faster than Jax
implementations. For large versions of the Cucker-Smale model, the Jax
implementation is thousands of times faster than the sensitivity analysis-based
implementation. In addition, our algorithms generate more accurate results both
on training and test data. Such gains in computational efficiency is paramount
for algorithms that implement real time parameter estimations, such as
diagnosis algorithms.
Related papers
- Performance Evaluation of Evolutionary Algorithms for Analog Integrated
Circuit Design Optimisation [0.0]
An automated sizing approach for analog circuits is presented in this paper.
A targeted search of the search space has been implemented using a particle generation function and a repair-bounds function.
The algorithms are tuned and modified to converge to a better optimal solution.
arXiv Detail & Related papers (2023-10-19T03:26:36Z) - Accelerating Cutting-Plane Algorithms via Reinforcement Learning
Surrogates [49.84541884653309]
A current standard approach to solving convex discrete optimization problems is the use of cutting-plane algorithms.
Despite the existence of a number of general-purpose cut-generating algorithms, large-scale discrete optimization problems continue to suffer from intractability.
We propose a method for accelerating cutting-plane algorithms via reinforcement learning.
arXiv Detail & Related papers (2023-07-17T20:11:56Z) - Learning the Positions in CountSketch [49.57951567374372]
We consider sketching algorithms which first compress data by multiplication with a random sketch matrix, and then apply the sketch to quickly solve an optimization problem.
In this work, we propose the first learning-based algorithms that also optimize the locations of the non-zero entries.
arXiv Detail & Related papers (2023-06-11T07:28:35Z) - Hybrid ACO-CI Algorithm for Beam Design problems [0.4397520291340694]
A novel hybrid version of the Ant colony optimization (ACO) method is developed using the sample space reduction technique of the Cohort Intelligence (CI) algorithm.
The proposed work could be investigate for real world applications encompassing domains of engineering, and health care problems.
arXiv Detail & Related papers (2023-03-29T04:37:14Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Accelerated Proximal Alternating Gradient-Descent-Ascent for Nonconvex
Minimax Machine Learning [12.069630105460766]
An Alternating Table-descentascent (AltGDA) is an computation optimization algorithm that has been widely used for training in various machine learning applications.
In this paper, we develop a single-loop fast computation-of-the-loop gradient-of-the-loop algorithm to solve non minimax optimization problems.
arXiv Detail & Related papers (2021-12-22T04:33:27Z) - An Accelerated Variance-Reduced Conditional Gradient Sliding Algorithm
for First-order and Zeroth-order Optimization [111.24899593052851]
Conditional gradient algorithm (also known as the Frank-Wolfe algorithm) has recently regained popularity in the machine learning community.
ARCS is the first zeroth-order conditional gradient sliding type algorithms solving convex problems in zeroth-order optimization.
In first-order optimization, the convergence results of ARCS substantially outperform previous algorithms in terms of the number of gradient query oracle.
arXiv Detail & Related papers (2021-09-18T07:08:11Z) - Provably Faster Algorithms for Bilevel Optimization [54.83583213812667]
Bilevel optimization has been widely applied in many important machine learning applications.
We propose two new algorithms for bilevel optimization.
We show that both algorithms achieve the complexity of $mathcalO(epsilon-1.5)$, which outperforms all existing algorithms by the order of magnitude.
arXiv Detail & Related papers (2021-06-08T21:05:30Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Optimal and Practical Algorithms for Smooth and Strongly Convex
Decentralized Optimization [21.555331273873175]
We consider the task of decentralized minimization of the sum of smooth strongly convex functions stored across the nodes of a network.
We propose two new algorithms for this decentralized optimization problem and equip them with complexity guarantees.
arXiv Detail & Related papers (2020-06-21T11:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.