Accelerated Policy Learning with Parallel Differentiable Simulation
- URL: http://arxiv.org/abs/2204.07137v1
- Date: Thu, 14 Apr 2022 17:46:26 GMT
- Title: Accelerated Policy Learning with Parallel Differentiable Simulation
- Authors: Jie Xu, Viktor Makoviychuk, Yashraj Narang, Fabio Ramos, Wojciech
Matusik, Animesh Garg, Miles Macklin
- Abstract summary: We present a differentiable simulator and a new policy learning algorithm (SHAC)
Our algorithm alleviates problems with local minima through a smooth critic function.
We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
- Score: 59.665651562534755
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning can generate complex control policies, but
requires large amounts of training data to work effectively. Recent work has
attempted to address this issue by leveraging differentiable simulators.
However, inherent problems such as local minima and exploding/vanishing
numerical gradients prevent these methods from being generally applied to
control tasks with complex contact-rich dynamics, such as humanoid locomotion
in classical RL benchmarks. In this work we present a high-performance
differentiable simulator and a new policy learning algorithm (SHAC) that can
effectively leverage simulation gradients, even in the presence of
non-smoothness. Our learning algorithm alleviates problems with local minima
through a smooth critic function, avoids vanishing/exploding gradients through
a truncated learning window, and allows many physical environments to be run in
parallel. We evaluate our method on classical RL control tasks, and show
substantial improvements in sample efficiency and wall-clock time over
state-of-the-art RL and differentiable simulation-based algorithms. In
addition, we demonstrate the scalability of our method by applying it to the
challenging high-dimensional problem of muscle-actuated locomotion with a large
action space, achieving a greater than 17x reduction in training time over the
best-performing established RL algorithm.
Related papers
- Offline reinforcement learning for job-shop scheduling problems [1.3927943269211593]
This paper introduces a novel offline RL method designed for optimization problems with complex constraints.
Our approach encodes actions in edge attributes and balances expected rewards with the imitation of expert solutions.
We demonstrate the effectiveness of this method on job-shop scheduling and flexible job-shop scheduling benchmarks.
arXiv Detail & Related papers (2024-10-21T07:33:42Z) - Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation [36.308936312224404]
This paper introduces Adaptive Horizon Actor-Critic (AHAC), an FO-MBRL algorithm that reduces gradient error by adapting the model-based horizon to avoid stiff dynamics.
Empirical findings reveal that AHAC outperforms MFRL baselines, attaining 40% more reward across a set of locomotion tasks and efficiently scaling to high-dimensional control environments with improved wall-clock-time efficiency.
arXiv Detail & Related papers (2024-05-28T03:28:00Z) - Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space [3.639580365066386]
We propose an adaptive adversarial coefficient framework to adjust the effect of the adversarial perturbation during training.
The appealing feature of our method is that it is simple to deploy in real-world applications and does not require accessing the simulator in advance.
The experiments in MuJoCo show that our method can improve the training stability and learn a robust policy when migrated to different test environments.
arXiv Detail & Related papers (2024-05-20T12:31:11Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Critic Sequential Monte Carlo [15.596665321375298]
CriticSMC is a new algorithm for planning as inference built from a novel composition of sequential Monte Carlo with soft-Q function factors.
Our experiments on self-driving car collision avoidance in simulation demonstrate improvements against baselines in terms of minimization of infraction relative to computational effort.
arXiv Detail & Related papers (2022-05-30T23:14:24Z) - Learning Robust Policy against Disturbance in Transition Dynamics via
State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments.
We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance.
Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z) - Geometric Deep Reinforcement Learning for Dynamic DAG Scheduling [8.14784681248878]
In this paper, we propose a reinforcement learning approach to solve a realistic scheduling problem.
We apply it to an algorithm commonly executed in the high performance computing community, the Cholesky factorization.
Our algorithm uses graph neural networks in combination with an actor-critic algorithm (A2C) to build an adaptive representation of the problem on the fly.
arXiv Detail & Related papers (2020-11-09T10:57:21Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z) - Information Theoretic Model Predictive Q-Learning [64.74041985237105]
We present a novel theoretical connection between information theoretic MPC and entropy regularized RL.
We develop a Q-learning algorithm that can leverage biased models.
arXiv Detail & Related papers (2019-12-31T00:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.