Related papers: GPU-Accelerated Policy Optimization via Batch Automatic Differentiation of Gaussian Processes for Real-World Control

GPU-Accelerated Policy Optimization via Batch Automatic Differentiation of Gaussian Processes for Real-World Control

URL: http://arxiv.org/abs/2202.13638v1
Date: Mon, 28 Feb 2022 09:31:15 GMT
Title: GPU-Accelerated Policy Optimization via Batch Automatic Differentiation of Gaussian Processes for Real-World Control
Authors: Abdolreza Taheri, Joni Pajarinen, Reza Ghabcheloo
Abstract summary: We develop a policy optimization method by leveraging fast predictive sampling methods to process batches of trajectories in every forward pass. We demonstrate the effectiveness of our approach in training policies on a set of reference-tracking control experiments with a heavy-duty machine.
Score: 8.720903734757627
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The ability of Gaussian processes (GPs) to predict the behavior of dynamical systems as a more sample-efficient alternative to parametric models seems promising for real-world robotics research. However, the computational complexity of GPs has made policy search a highly time and memory consuming process that has not been able to scale to larger problems. In this work, we develop a policy optimization method by leveraging fast predictive sampling methods to process batches of trajectories in every forward pass, and compute gradient updates over policy parameters by automatic differentiation of Monte Carlo evaluations, all on GPU. We demonstrate the effectiveness of our approach in training policies on a set of reference-tracking control experiments with a heavy-duty machine. Benchmark results show a significant speedup over exact methods and showcase the scalability of our method to larger policy networks, longer horizons, and up to thousands of trajectories with a sublinear drop in speed.

Related papers

Relative Entropy Pathwise Policy Optimization [56.86405621176669]
We show how to construct a value-gradient driven, on-policy algorithm that allow training Q-value models purely from on-policy data.<n>We propose Relative Entropy Pathwise Policy Optimization (REPPO), an efficient on-policy algorithm that combines the sample-efficiency of pathwise policy gradients with the simplicity and minimal memory footprint of standard on-policy learning.
arXiv Detail & Related papers (2025-07-15T06:24:07Z)
Enhancing Path Planning Performance through Image Representation Learning of High-Dimensional Configuration Spaces [0.4143603294943439]
We present a novel method for accelerating path-planning tasks in unknown scenes with obstacles. We approximate the distribution of waypoints for a collision-free path using the Rapidly-exploring Random Tree algorithm. Our experiments demonstrate promising results in accelerating path-planning tasks under critical time constraints.
arXiv Detail & Related papers (2025-01-11T21:14:52Z)
Towards safe and tractable Gaussian process-based MPC: Efficient sampling within a sequential quadratic programming framework [35.79393879150088]
We propose a robust GP-MPC formulation that guarantees constraint satisfaction with high probability. We highlight the improved reachable set approximation compared to existing methods, as well as real-time feasible times.
arXiv Detail & Related papers (2024-09-13T08:15:20Z)
Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers. Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy. We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z)
Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. In common practice, convergence (hyper)policies are learned only to deploy their deterministic version. We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z)
Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces. We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z)
Inferring Smooth Control: Monte Carlo Posterior Policy Iteration with Gaussian Processes [39.411957858548355]
We show how to achieve smoother model predictive factor control using online sequential inference. We evaluate this approach on several robot control tasks, matching to sample prior methods while also ensuring smoothness.
arXiv Detail & Related papers (2022-10-07T12:56:31Z)
Learning Robust Controllers Via Probabilistic Model-Based Policy Search [2.886634516775814]
We investigate whether controllers learned in such a way are robust and able to generalize under small perturbations of the environment. We show that enforcing a lower bound to the likelihood noise in the Gaussian Process dynamics model regularizes the policy updates and yields more robust controllers.
arXiv Detail & Related papers (2021-10-26T11:17:31Z)
Bregman Gradient Policy Optimization [97.73041344738117]
We design a Bregman gradient policy optimization for reinforcement learning based on Bregman divergences and momentum techniques. VR-BGPO reaches the best complexity $tilde(epsilon-3)$ for finding an $epsilon$stationary point only requiring one trajectory at each iteration.
arXiv Detail & Related papers (2021-06-23T01:08:54Z)
ParticleAugment: Sampling-Based Data Augmentation [80.44268663372233]
We propose a particle filtering formulation to find optimal augmentation policies and their schedules during model training. We show that our formulation for automated augmentation reaches promising results on CIFAR-10, CIFAR-100, and ImageNet datasets.
arXiv Detail & Related papers (2021-06-16T10:56:02Z)
Learning Sampling Policy for Faster Derivative Free Optimization [100.27518340593284]
We propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling. Our results show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.
arXiv Detail & Related papers (2021-04-09T14:50:59Z)
Gaussian Process Policy Optimization [0.0]
We propose a novel actor-critic, model-free reinforcement learning algorithm. It employs a Bayesian method of parameter space exploration to solve environments. It is shown to be comparable to and at times empirically outperform current algorithms on environments that simulate robotic locomotion.
arXiv Detail & Related papers (2020-03-02T18:06:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.