Related papers: Accelerated Policy Learning with Parallel Differentiable Simulation

Accelerated Policy Learning with Parallel Differentiable Simulation

URL: http://arxiv.org/abs/2204.07137v1
Date: Thu, 14 Apr 2022 17:46:26 GMT
Title: Accelerated Policy Learning with Parallel Differentiable Simulation
Authors: Jie Xu, Viktor Makoviychuk, Yashraj Narang, Fabio Ramos, Wojciech Matusik, Animesh Garg, Miles Macklin
Abstract summary: We present a differentiable simulator and a new policy learning algorithm (SHAC) Our algorithm alleviates problems with local minima through a smooth critic function. We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
Score: 59.665651562534755
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep reinforcement learning can generate complex control policies, but requires large amounts of training data to work effectively. Recent work has attempted to address this issue by leveraging differentiable simulators. However, inherent problems such as local minima and exploding/vanishing numerical gradients prevent these methods from being generally applied to control tasks with complex contact-rich dynamics, such as humanoid locomotion in classical RL benchmarks. In this work we present a high-performance differentiable simulator and a new policy learning algorithm (SHAC) that can effectively leverage simulation gradients, even in the presence of non-smoothness. Our learning algorithm alleviates problems with local minima through a smooth critic function, avoids vanishing/exploding gradients through a truncated learning window, and allows many physical environments to be run in parallel. We evaluate our method on classical RL control tasks, and show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms. In addition, we demonstrate the scalability of our method by applying it to the challenging high-dimensional problem of muscle-actuated locomotion with a large action space, achieving a greater than 17x reduction in training time over the best-performing established RL algorithm.

Related papers

Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation [11.360832156847103]
This paper presents a novel RL algorithm and a simulation platform to enable scaling RL on tasks involving rigid bodies and deformables. We introduce Soft Analytic Policy (SAPO), a maximum entropy first-order model-based RL algorithm, which uses first-order analytic gradients to train an actor to maximize expected return and entropy. We also develop Rewarped, a parallel differentiable multiphysics simulation platform that supports simulating various materials beyond rigid bodies.
arXiv Detail & Related papers (2024-12-16T18:56:24Z)
Offline reinforcement learning for job-shop scheduling problems [1.3927943269211593]
This paper introduces a novel offline RL method designed for optimization problems with complex constraints. Our approach encodes actions in edge attributes and balances expected rewards with the imitation of expert solutions. We demonstrate the effectiveness of this method on job-shop scheduling and flexible job-shop scheduling benchmarks.
arXiv Detail & Related papers (2024-10-21T07:33:42Z)
Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers. Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy. We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z)
Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation [36.308936312224404]
This paper introduces Adaptive Horizon Actor-Critic (AHAC), an FO-MBRL algorithm that reduces gradient error by adapting the model-based horizon to avoid stiff dynamics. Empirical findings reveal that AHAC outperforms MFRL baselines, attaining 40% more reward across a set of locomotion tasks and efficiently scaling to high-dimensional control environments with improved wall-clock-time efficiency.
arXiv Detail & Related papers (2024-05-28T03:28:00Z)
Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space [3.639580365066386]
We propose an adaptive adversarial coefficient framework to adjust the effect of the adversarial perturbation during training. The appealing feature of our method is that it is simple to deploy in real-world applications and does not require accessing the simulator in advance. The experiments in MuJoCo show that our method can improve the training stability and learn a robust policy when migrated to different test environments.
arXiv Detail & Related papers (2024-05-20T12:31:11Z)
Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data. In this paper, we propose an adaptive scheme for action quantization. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z)
Critic Sequential Monte Carlo [15.596665321375298]
CriticSMC is a new algorithm for planning as inference built from a novel composition of sequential Monte Carlo with soft-Q function factors. Our experiments on self-driving car collision avoidance in simulation demonstrate improvements against baselines in terms of minimization of infraction relative to computational effort.
arXiv Detail & Related papers (2022-05-30T23:14:24Z)
Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments. We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance. Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z)
Geometric Deep Reinforcement Learning for Dynamic DAG Scheduling [8.14784681248878]
In this paper, we propose a reinforcement learning approach to solve a realistic scheduling problem. We apply it to an algorithm commonly executed in the high performance computing community, the Cholesky factorization. Our algorithm uses graph neural networks in combination with an actor-critic algorithm (A2C) to build an adaptive representation of the problem on the fly.
arXiv Detail & Related papers (2020-11-09T10:57:21Z)
Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension. We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL. We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.