Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
- URL: http://arxiv.org/abs/2412.12089v1
- Date: Mon, 16 Dec 2024 18:56:24 GMT
- Title: Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
- Authors: Eliot Xing, Vernon Luk, Jean Oh,
- Abstract summary: This paper presents a novel RL algorithm and a simulation platform to enable scaling RL on tasks involving rigid bodies and deformables.
We introduce Soft Analytic Policy (SAPO), a maximum entropy first-order model-based RL algorithm, which uses first-order analytic gradients from differentiable simulation to train an actor to maximize expected return and entropy.
We also develop Rewarped, a parallel differentiable multiphysics simulation platform that supports simulating various materials beyond rigid bodies.
- Score: 11.360832156847103
- License:
- Abstract: Recent advances in GPU-based parallel simulation have enabled practitioners to collect large amounts of data and train complex control policies using deep reinforcement learning (RL), on commodity GPUs. However, such successes for RL in robotics have been limited to tasks sufficiently simulated by fast rigid-body dynamics. Simulation techniques for soft bodies are comparatively several orders of magnitude slower, thereby limiting the use of RL due to sample complexity requirements. To address this challenge, this paper presents both a novel RL algorithm and a simulation platform to enable scaling RL on tasks involving rigid bodies and deformables. We introduce Soft Analytic Policy Optimization (SAPO), a maximum entropy first-order model-based actor-critic RL algorithm, which uses first-order analytic gradients from differentiable simulation to train a stochastic actor to maximize expected return and entropy. Alongside our approach, we develop Rewarped, a parallel differentiable multiphysics simulation platform that supports simulating various materials beyond rigid bodies. We re-implement challenging manipulation and locomotion tasks in Rewarped, and show that SAPO outperforms baselines over a range of tasks that involve interaction between rigid bodies, articulations, and deformables.
Related papers
- Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous
Driving Research [76.93956925360638]
Waymax is a new data-driven simulator for autonomous driving in multi-agent scenes.
It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training.
We benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions.
arXiv Detail & Related papers (2023-10-12T20:49:15Z) - Hindsight States: Blending Sim and Real Task Elements for Efficient
Reinforcement Learning [61.3506230781327]
In robotics, one approach to generate training data builds on simulations based on dynamics models derived from first principles.
Here, we leverage the imbalance in complexity of the dynamics to learn more sample-efficiently.
We validate our method on several challenging simulated tasks and demonstrate that it improves learning both alone and when combined with an existing hindsight algorithm.
arXiv Detail & Related papers (2023-03-03T21:55:04Z) - Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC)
Our algorithm alleviates problems with local minima through a smooth critic function.
We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z) - QuadSim: A Quadcopter Rotational Dynamics Simulation Framework For
Reinforcement Learning Algorithms [0.0]
This study focuses on designing and developing a mathematically based quadcopter rotational dynamics simulation framework.
The framework aims to simulate both linear and nonlinear representations of a quadcopter.
The simulation environment has been expanded to be compatible with the OpenAI Gym toolkit.
arXiv Detail & Related papers (2022-02-14T20:34:08Z) - DiffSRL: Learning Dynamic-aware State Representation for Deformable
Object Control with Differentiable Simulator [26.280021036447213]
Latent space that can capture dynamics related information has wide application in areas such as accelerating model free reinforcement learning.
We propose DiffSRL, a dynamic state representation learning pipeline utilizing differentiable simulation.
Our model demonstrates superior performance in terms of capturing long-term dynamics as well as reward prediction.
arXiv Detail & Related papers (2021-10-24T04:53:58Z) - Deep Bayesian Active Learning for Accelerating Stochastic Simulation [74.58219903138301]
Interactive Neural Process (INP) is a deep active learning framework for simulations and with active learning approaches.
For active learning, we propose a novel acquisition function, Latent Information Gain (LIG), calculated in the latent space of NP based models.
The results demonstrate STNP outperforms the baselines in the learning setting and LIG achieves the state-of-the-art for active learning.
arXiv Detail & Related papers (2021-06-05T01:31:51Z) - PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable
Physics [89.81550748680245]
We introduce a new differentiable physics benchmark called PasticineLab.
In each task, the agent uses manipulators to deform the plasticine into the desired configuration.
We evaluate several existing reinforcement learning (RL) methods and gradient-based methods on this benchmark.
arXiv Detail & Related papers (2021-04-07T17:59:23Z) - Reinforcement Learning for Adaptive Mesh Refinement [63.7867809197671]
We propose a novel formulation of AMR as a Markov decision process and apply deep reinforcement learning to train refinement policies directly from simulation.
The model sizes of these policy architectures are independent of the mesh size and hence scale to arbitrarily large and complex simulations.
arXiv Detail & Related papers (2021-03-01T22:55:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.