CACTO-SL: Using Sobolev Learning to improve Continuous Actor-Critic with
Trajectory Optimization
- URL: http://arxiv.org/abs/2312.10666v1
- Date: Sun, 17 Dec 2023 09:44:41 GMT
- Title: CACTO-SL: Using Sobolev Learning to improve Continuous Actor-Critic with
Trajectory Optimization
- Authors: Elisa Alboni, Gianluigi Grandesso, Gastone Pietro Rosati Papini,
Justin Carpentier, Andrea Del Prete
- Abstract summary: Trabo learning guide TO and Reinforcement Learning (RL) are powerful tools to solve optimal control problems.
In this work, we present an extension of CACTO exploiting the idea of Solev-SL.
- Score: 12.115023915042617
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Trajectory Optimization (TO) and Reinforcement Learning (RL) are powerful and
complementary tools to solve optimal control problems. On the one hand, TO can
efficiently compute locally-optimal solutions, but it tends to get stuck in
local minima if the problem is not convex. On the other hand, RL is typically
less sensitive to non-convexity, but it requires a much higher computational
effort. Recently, we have proposed CACTO (Continuous Actor-Critic with
Trajectory Optimization), an algorithm that uses TO to guide the exploration of
an actor-critic RL algorithm. In turns, the policy encoded by the actor is used
to warm-start TO, closing the loop between TO and RL. In this work, we present
an extension of CACTO exploiting the idea of Sobolev learning. To make the
training of the critic network faster and more data efficient, we enrich it
with the gradient of the Value function, computed via a backward pass of the
differential dynamic programming algorithm. Our results show that the new
algorithm is more efficient than the original CACTO, reducing the number of TO
episodes by a factor ranging from 3 to 10, and consequently the computation
time. Moreover, we show that CACTO-SL helps TO to find better minima and to
produce more consistent results.
Related papers
- Provably Efficient Offline Goal-Conditioned Reinforcement Learning with
General Function Approximation and Single-Policy Concentrability [11.786486763236104]
Goal-conditioned reinforcement learning (GCRL) refers to learning general-purpose skills that aim to reach diverse goals.
offline GCRL only requires purely pre-collected datasets to perform training tasks.
We show that a modified offline GCRL algorithm is both provably efficient with general function approximation and single-policy concentrability.
arXiv Detail & Related papers (2023-02-07T22:04:55Z) - Adaptive Federated Minimax Optimization with Lower Complexities [82.51223883622552]
We propose an efficient adaptive minimax optimization algorithm (i.e., AdaFGDA) to solve these minimax problems.
It builds our momentum-based reduced and localSGD techniques, and it flexibly incorporate various adaptive learning rates.
arXiv Detail & Related papers (2022-11-14T12:32:18Z) - CACTO: Continuous Actor-Critic with Trajectory Optimization -- Towards
global optimality [5.0915256711576475]
This paper presents a novel algorithm for the continuous control of dynamical systems that combines Trayy (TO) and Reinforcement Learning (RL) in a single trajectory.
arXiv Detail & Related papers (2022-11-12T10:16:35Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Large-scale Optimization of Partial AUC in a Range of False Positive
Rates [51.12047280149546]
The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning.
We develop an efficient approximated gradient descent method based on recent practical envelope smoothing technique.
Our proposed algorithm can also be used to minimize the sum of some ranked range loss, which also lacks efficient solvers.
arXiv Detail & Related papers (2022-03-03T03:46:18Z) - Recursive Least Squares Advantage Actor-Critic Algorithms [20.792917267835247]
We propose two novel RLS-based advantage actor critic (A2C) algorithms.
RLSSA2C and RLSNA2C, use the RLS method to train the critic network and the hidden layers of the actor network.
From the experimental results, it is shown that our both algorithms have better sample efficiency than the vanilla A2C on most games or tasks.
arXiv Detail & Related papers (2022-01-15T20:00:26Z) - Constraint Sampling Reinforcement Learning: Incorporating Expertise For
Faster Learning [43.562783189118]
We introduce a practical algorithm for incorporating human insight to speed learning.
Our algorithm, Constraint Sampling Reinforcement Learning (CSRL), incorporates prior domain knowledge as constraints/restrictions on the RL policy.
In all cases, CSRL learns a good policy faster than baselines.
arXiv Detail & Related papers (2021-12-30T22:02:42Z) - Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling.
Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z) - Evolving Reinforcement Learning Algorithms [186.62294652057062]
We propose a method for meta-learning reinforcement learning algorithms.
The learned algorithms are domain-agnostic and can generalize to new environments not seen during training.
We highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games.
arXiv Detail & Related papers (2021-01-08T18:55:07Z) - Channel Assignment in Uplink Wireless Communication using Machine
Learning Approach [54.012791474906514]
This letter investigates a channel assignment problem in uplink wireless communication systems.
Our goal is to maximize the sum rate of all users subject to integer channel assignment constraints.
Due to high computational complexity, machine learning approaches are employed to obtain computational efficient solutions.
arXiv Detail & Related papers (2020-01-12T15:54:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.