Deterministic Trajectory Optimization through Probabilistic Optimal Control
- URL: http://arxiv.org/abs/2407.13316v2
- Date: Tue, 8 Oct 2024 12:00:41 GMT
- Title: Deterministic Trajectory Optimization through Probabilistic Optimal Control
- Authors: Mohammad Mahmoudi Filabadi, Tom Lefebvre, Guillaume Crevecoeur,
- Abstract summary: We propose two new algorithms for discrete-time deterministic finite-horizon nonlinear optimal control problems.
Both algorithms are inspired by a novel theoretical paradigm known as probabilistic optimal control.
We show that the application of this algorithm results in a fixed point of probabilistic policies that converge to the deterministic optimal policy.
- Score: 3.2771631221674333
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: This article proposes two new algorithms tailored to discrete-time deterministic finite-horizon nonlinear optimal control problems or so-called trajectory optimization problems. Both algorithms are inspired by a novel theoretical paradigm known as probabilistic optimal control, that reformulates optimal control as an equivalent probabilistic inference problem. This perspective allows to address the problem using the Expectation-Maximization algorithm. We show that the application of this algorithm results in a fixed point iteration of probabilistic policies that converge to the deterministic optimal policy. Two strategies for policy evaluation are discussed, using state-of-the-art uncertainty quantification methods resulting into two distinct algorithms. The algorithms are structurally closest related to the differential dynamic programming algorithm and related methods that use sigma-point methods to avoid direct gradient evaluations. The main advantage of our work is an improved balance between exploration and exploitation over the iterations, leading to improved numerical stability and accelerated convergence. These properties are demonstrated on different nonlinear systems.
Related papers
- From Optimization to Control: Quasi Policy Iteration [3.4376560669160394]
We introduce a novel control algorithm coined as quasi-policy iteration (QPI)
QPI is based on a novel approximation of the "Hessian" matrix in the policy iteration algorithm by exploiting two linear structural constraints specific to MDPs.
It exhibits an empirical convergence behavior similar to policy iteration with a very low sensitivity to the discount factor.
arXiv Detail & Related papers (2023-11-18T21:00:14Z) - Accelerating Cutting-Plane Algorithms via Reinforcement Learning
Surrogates [49.84541884653309]
A current standard approach to solving convex discrete optimization problems is the use of cutting-plane algorithms.
Despite the existence of a number of general-purpose cut-generating algorithms, large-scale discrete optimization problems continue to suffer from intractability.
We propose a method for accelerating cutting-plane algorithms via reinforcement learning.
arXiv Detail & Related papers (2023-07-17T20:11:56Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Reinforcement Learning with Unbiased Policy Evaluation and Linear
Function Approximation [11.345796608258434]
We provide performance guarantees for a variant of simulation-based policy iteration for controlling Markov decision processes.
We analyze two algorithms; the first algorithm involves a least squares approach where a new set of weights associated with feature vectors is obtained via at least squares at each iteration.
The second algorithm involves a two-time-scale approximation algorithm taking several steps of gradient descent towards the least squares solution.
arXiv Detail & Related papers (2022-10-13T20:16:19Z) - Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings.
We exploit a warm-start strategy to amortize the estimation of the exact gradient.
By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z) - Zeroth and First Order Stochastic Frank-Wolfe Algorithms for Constrained
Optimization [13.170519806372075]
Problems of convex optimization with two sets of constraints arise frequently in the context of semidefinite programming.
Since projection onto the first set of constraints is difficult, it becomes necessary to explore projection-free algorithms.
The efficacy of the proposed algorithms is tested on relevant applications of sparse matrix estimation, clustering via semidefinite relaxation, and uniform sparsest cut problem.
arXiv Detail & Related papers (2021-07-14T08:01:30Z) - A Stochastic Sequential Quadratic Optimization Algorithm for Nonlinear
Equality Constrained Optimization with Rank-Deficient Jacobians [11.03311584463036]
A sequential quadratic optimization algorithm is proposed for solving smooth nonlinear equality constrained optimization problems.
Results of numerical experiments demonstrate that the algorithm offers superior performance when compared to popular alternatives.
arXiv Detail & Related papers (2021-06-24T13:46:52Z) - Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs.
bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad.
We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z) - Approximate Midpoint Policy Iteration for Linear Quadratic Control [1.0312968200748118]
We present a midpoint policy iteration algorithm to solve linear quadratic optimal control problems in both model-based and model-free settings.
We show that in the model-based setting it achieves cubic convergence, which is superior to standard policy iteration and policy algorithms that achieve quadratic and linear convergence.
arXiv Detail & Related papers (2020-11-28T20:22:10Z) - A Dynamical Systems Approach for Convergence of the Bayesian EM
Algorithm [59.99439951055238]
We show how (discrete-time) Lyapunov stability theory can serve as a powerful tool to aid, or even lead, in the analysis (and potential design) of optimization algorithms that are not necessarily gradient-based.
The particular ML problem that this paper focuses on is that of parameter estimation in an incomplete-data Bayesian framework via the popular optimization algorithm known as maximum a posteriori expectation-maximization (MAP-EM)
We show that fast convergence (linear or quadratic) is achieved, which could have been difficult to unveil without our adopted S&C approach.
arXiv Detail & Related papers (2020-06-23T01:34:18Z) - Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory.
Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.