Training Efficient Controllers via Analytic Policy Gradient
- URL: http://arxiv.org/abs/2209.13052v3
- Date: Tue, 2 May 2023 21:29:49 GMT
- Title: Training Efficient Controllers via Analytic Policy Gradient
- Authors: Nina Wiedemann, Valentin W\"uest, Antonio Loquercio, Matthias
M\"uller, Dario Floreano, Davide Scaramuzza
- Abstract summary: Control design for robotic systems is complex and often requires solving an optimization to follow a trajectory accurately.
Online optimization approaches like Model Predictive Control (MPC) have been shown to achieve great tracking performance, but require high computing power.
We propose an Analytic Policy Gradient (APG) method to tackle this problem.
- Score: 44.0762454494769
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Control design for robotic systems is complex and often requires solving an
optimization to follow a trajectory accurately. Online optimization approaches
like Model Predictive Control (MPC) have been shown to achieve great tracking
performance, but require high computing power. Conversely, learning-based
offline optimization approaches, such as Reinforcement Learning (RL), allow
fast and efficient execution on the robot but hardly match the accuracy of MPC
in trajectory tracking tasks. In systems with limited compute, such as aerial
vehicles, an accurate controller that is efficient at execution time is
imperative. We propose an Analytic Policy Gradient (APG) method to tackle this
problem. APG exploits the availability of differentiable simulators by training
a controller offline with gradient descent on the tracking error. We address
training instabilities that frequently occur with APG through curriculum
learning and experiment on a widely used controls benchmark, the CartPole, and
two common aerial robots, a quadrotor and a fixed-wing drone. Our proposed
method outperforms both model-based and model-free RL methods in terms of
tracking error. Concurrently, it achieves similar performance to MPC while
requiring more than an order of magnitude less computation time. Our work
provides insights into the potential of APG as a promising control method for
robotics. To facilitate the exploration of APG, we open-source our code and
make it available at https://github.com/lis-epfl/apg_trajectory_tracking.
Related papers
- Goal-Conditioned Terminal Value Estimation for Real-time and Multi-task Model Predictive Control [1.2687745030755995]
We develop an MPC framework with goal-conditioned terminal value learning to achieve multitask policy optimization.
We evaluate the proposed method on a bipedal inverted pendulum robot model and confirm that combining goal-conditioned terminal value learning with an upper-level trajectory planner enables real-time control.
arXiv Detail & Related papers (2024-10-07T11:19:23Z) - Leveraging Symmetry to Accelerate Learning of Trajectory Tracking Controllers for Free-Flying Robotic Systems [24.360194697715382]
Tracking controllers enable robotic systems to accurately follow planned reference trajectories.
In this work, we leverage the inherent Lie group symmetries of robotic systems with a floating base to mitigate these challenges when learning tracking controllers.
Results show that a symmetry-aware approach both accelerates training and reduces tracking error after the same number of training steps.
arXiv Detail & Related papers (2024-09-17T14:39:24Z) - Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - SERL: A Software Suite for Sample-Efficient Robotic Reinforcement
Learning [85.21378553454672]
We develop a library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment.
We find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation.
These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent robustness recovery and correction behaviors.
arXiv Detail & Related papers (2024-01-29T10:01:10Z) - Modelling, Positioning, and Deep Reinforcement Learning Path Tracking
Control of Scaled Robotic Vehicles: Design and Experimental Validation [3.807917169053206]
Scaled robotic cars are commonly equipped with a hierarchical control acthiecture that includes tasks dedicated to vehicle state estimation and control.
This paper covers both aspects by proposing (i) a federeted extended Kalman filter (FEKF) and (ii) a novel deep reinforcement learning (DRL) path tracking controller trained via an expert demonstrator.
The experimentally validated model is used for (i) supporting the design of the FEKF and (ii) serving as a digital twin for training the proposed DRL-based path tracking algorithm.
arXiv Detail & Related papers (2024-01-10T14:40:53Z) - Policy Search for Model Predictive Control with Application to Agile
Drone Flight [56.24908013905407]
We propose a policy-search-for-model-predictive-control framework for MPC.
Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies.
Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world.
arXiv Detail & Related papers (2021-12-07T17:39:24Z) - Adaptive Optimal Trajectory Tracking Control Applied to a Large-Scale
Ball-on-Plate System [0.0]
We propose an ADP-based optimal trajectory tracking controller for a large-scale ball-on-plate system.
Our proposed method incorporates an approximated reference trajectory instead of using setpoint tracking and allows to automatically compensate for constant offset terms.
Our experimental results show that this tracking mechanism significantly reduces the control cost compared to setpoint controllers.
arXiv Detail & Related papers (2020-10-26T11:22:03Z) - Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot
Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO)
We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.