Related papers: Training Efficient Controllers via Analytic Policy Gradient

Training Efficient Controllers via Analytic Policy Gradient

URL: http://arxiv.org/abs/2209.13052v3
Date: Tue, 2 May 2023 21:29:49 GMT
Title: Training Efficient Controllers via Analytic Policy Gradient
Authors: Nina Wiedemann, Valentin W\"uest, Antonio Loquercio, Matthias M\"uller, Dario Floreano, Davide Scaramuzza
Abstract summary: Control design for robotic systems is complex and often requires solving an optimization to follow a trajectory accurately. Online optimization approaches like Model Predictive Control (MPC) have been shown to achieve great tracking performance, but require high computing power. We propose an Analytic Policy Gradient (APG) method to tackle this problem.
Score: 44.0762454494769
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Control design for robotic systems is complex and often requires solving an optimization to follow a trajectory accurately. Online optimization approaches like Model Predictive Control (MPC) have been shown to achieve great tracking performance, but require high computing power. Conversely, learning-based offline optimization approaches, such as Reinforcement Learning (RL), allow fast and efficient execution on the robot but hardly match the accuracy of MPC in trajectory tracking tasks. In systems with limited compute, such as aerial vehicles, an accurate controller that is efficient at execution time is imperative. We propose an Analytic Policy Gradient (APG) method to tackle this problem. APG exploits the availability of differentiable simulators by training a controller offline with gradient descent on the tracking error. We address training instabilities that frequently occur with APG through curriculum learning and experiment on a widely used controls benchmark, the CartPole, and two common aerial robots, a quadrotor and a fixed-wing drone. Our proposed method outperforms both model-based and model-free RL methods in terms of tracking error. Concurrently, it achieves similar performance to MPC while requiring more than an order of magnitude less computation time. Our work provides insights into the potential of APG as a promising control method for robotics. To facilitate the exploration of APG, we open-source our code and make it available at https://github.com/lis-epfl/apg_trajectory_tracking.

Related papers

Tangled Program Graphs as an alternative to DRL-based control algorithms for UAVs [0.43695508295565777]
Deep reinforcement learning (DRL) is currently the most popular AI-based approach to autonomous vehicle control. This approach has some significant drawbacks: high computational requirements and low explainability. We propose to use Tangled Program Graphs (TPGs) as an alternative for DRL in control-related tasks.
arXiv Detail & Related papers (2024-11-08T14:20:29Z)
Goal-Conditioned Terminal Value Estimation for Real-time and Multi-task Model Predictive Control [1.2687745030755995]
We develop an MPC framework with goal-conditioned terminal value learning to achieve multitask policy optimization. We evaluate the proposed method on a bipedal inverted pendulum robot model and confirm that combining goal-conditioned terminal value learning with an upper-level trajectory planner enables real-time control.
arXiv Detail & Related papers (2024-10-07T11:19:23Z)
Leveraging Symmetry to Accelerate Learning of Trajectory Tracking Controllers for Free-Flying Robotic Systems [24.360194697715382]
Tracking controllers enable robotic systems to accurately follow planned reference trajectories. In this work, we leverage the inherent Lie group symmetries of robotic systems with a floating base to mitigate these challenges when learning tracking controllers. Results show that a symmetry-aware approach both accelerates training and reduces tracking error after the same number of training steps.
arXiv Detail & Related papers (2024-09-17T14:39:24Z)
Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers. Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy. We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z)
SERL: A Software Suite for Sample-Efficient Robotic Reinforcement Learning [85.21378553454672]
We develop a library containing a sample efficient off-policy deep RL method, together with methods for computing rewards and resetting the environment. We find that our implementation can achieve very efficient learning, acquiring policies for PCB board assembly, cable routing, and object relocation. These policies achieve perfect or near-perfect success rates, extreme robustness even under perturbations, and exhibit emergent robustness recovery and correction behaviors.
arXiv Detail & Related papers (2024-01-29T10:01:10Z)
Policy Search for Model Predictive Control with Application to Agile Drone Flight [56.24908013905407]
We propose a policy-search-for-model-predictive-control framework for MPC. Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies. Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world.
arXiv Detail & Related papers (2021-12-07T17:39:24Z)
Adaptive Optimal Trajectory Tracking Control Applied to a Large-Scale Ball-on-Plate System [0.0]
We propose an ADP-based optimal trajectory tracking controller for a large-scale ball-on-plate system. Our proposed method incorporates an approximated reference trajectory instead of using setpoint tracking and allows to automatically compensate for constant offset terms. Our experimental results show that this tracking mechanism significantly reduces the control cost compared to setpoint controllers.
arXiv Detail & Related papers (2020-10-26T11:22:03Z)
Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot Locomotion [78.46388769788405]
We introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained policy optimization (CPPO) We show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
arXiv Detail & Related papers (2020-02-22T10:15:53Z)
Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL. We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.