CACTO: Continuous Actor-Critic with Trajectory Optimization -- Towards
global optimality
- URL: http://arxiv.org/abs/2211.06625v3
- Date: Mon, 8 May 2023 12:48:25 GMT
- Title: CACTO: Continuous Actor-Critic with Trajectory Optimization -- Towards
global optimality
- Authors: Gianluigi Grandesso, Elisa Alboni, Gastone P. Rosati Papini, Patrick
M. Wensing and Andrea Del Prete
- Abstract summary: This paper presents a novel algorithm for the continuous control of dynamical systems that combines Trayy (TO) and Reinforcement Learning (RL) in a single trajectory.
- Score: 5.0915256711576475
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper presents a novel algorithm for the continuous control of dynamical
systems that combines Trajectory Optimization (TO) and Reinforcement Learning
(RL) in a single framework. The motivations behind this algorithm are the two
main limitations of TO and RL when applied to continuous nonlinear systems to
minimize a non-convex cost function. Specifically, TO can get stuck in poor
local minima when the search is not initialized close to a "good" minimum. On
the other hand, when dealing with continuous state and control spaces, the RL
training process may be excessively long and strongly dependent on the
exploration strategy. Thus, our algorithm learns a "good" control policy via
TO-guided RL policy search that, when used as initial guess provider for TO,
makes the trajectory optimization process less prone to converge to poor local
optima. Our method is validated on several reaching problems featuring
non-convex obstacle avoidance with different dynamical systems, including a car
model with 6D state, and a 3-joint planar manipulator. Our results show the
great capabilities of CACTO in escaping local minima, while being more
computationally efficient than the Deep Deterministic Policy Gradient (DDPG)
and Proximal Policy Optimization (PPO) RL algorithms.
Related papers
- Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains.
Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint.
This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions.
The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - CACTO-SL: Using Sobolev Learning to improve Continuous Actor-Critic with
Trajectory Optimization [12.115023915042617]
Trabo learning guide TO and Reinforcement Learning (RL) are powerful tools to solve optimal control problems.
In this work, we present an extension of CACTO exploiting the idea of Solev-SL.
arXiv Detail & Related papers (2023-12-17T09:44:41Z) - Offline Policy Optimization in RL with Variance Regularizaton [142.87345258222942]
We propose variance regularization for offline RL algorithms, using stationary distribution corrections.
We show that by using Fenchel duality, we can avoid double sampling issues for computing the gradient of the variance regularizer.
The proposed algorithm for offline variance regularization (OVAR) can be used to augment any existing offline policy optimization algorithms.
arXiv Detail & Related papers (2022-12-29T18:25:01Z) - Model-based Safe Deep Reinforcement Learning via a Constrained Proximal
Policy Optimization Algorithm [4.128216503196621]
We propose an On-policy Model-based Safe Deep RL algorithm in which we learn the transition dynamics of the environment in an online manner.
We show that our algorithm is more sample efficient and results in lower cumulative hazard violations as compared to constrained model-free approaches.
arXiv Detail & Related papers (2022-10-14T06:53:02Z) - A Policy Efficient Reduction Approach to Convex Constrained Deep
Reinforcement Learning [2.811714058940267]
We propose a new variant of the conditional gradient (CG) type algorithm, which generalizes the minimum norm point (MNP) method.
Our method reduces the memory costs by an order of magnitude, and achieves better performance, demonstrating both its effectiveness and efficiency.
arXiv Detail & Related papers (2021-08-29T20:51:32Z) - OptiDICE: Offline Policy Optimization via Stationary Distribution
Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way.
Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy.
We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Integrated Decision and Control: Towards Interpretable and Efficient
Driving Intelligence [13.589285628074542]
We present an interpretable and efficient decision and control framework for automated vehicles.
It decomposes the driving task into multi-path planning and optimal tracking that are structured hierarchically.
Results show that our method has better online computing efficiency and driving performance including traffic efficiency and safety.
arXiv Detail & Related papers (2021-03-18T14:43:31Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.