Related papers: Distilled Domain Randomization

Distilled Domain Randomization

URL: http://arxiv.org/abs/2112.03149v1
Date: Mon, 6 Dec 2021 16:35:08 GMT
Title: Distilled Domain Randomization
Authors: Julien Brosseit, Benedikt Hahner, Fabio Muratore, Michael Gienger, Jan Peters
Abstract summary: We propose to combine reinforcement learning from randomized physics simulations with policy distillation. Our algorithm, called Distilled Domain Randomization (DiDoR), distills so-called teacher policies, which are experts on domains. This way, DiDoR learns controllers which transfer directly from simulation to reality, without requiring data from the target domain.
Score: 23.178141671320436
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep reinforcement learning is an effective tool to learn robot control policies from scratch. However, these methods are notorious for the enormous amount of required training data which is prohibitively expensive to collect on real robots. A highly popular alternative is to learn from simulations, allowing to generate the data much faster, safer, and cheaper. Since all simulators are mere models of reality, there are inevitable differences between the simulated and the real data, often referenced as the 'reality gap'. To bridge this gap, many approaches learn one policy from a distribution over simulators. In this paper, we propose to combine reinforcement learning from randomized physics simulations with policy distillation. Our algorithm, called Distilled Domain Randomization (DiDoR), distills so-called teacher policies, which are experts on domains that have been sampled initially, into a student policy that is later deployed. This way, DiDoR learns controllers which transfer directly from simulation to reality, i.e., without requiring data from the target domain. We compare DiDoR against three baselines in three sim-to-sim as well as two sim-to-real experiments. Our results show that the target domain performance of policies trained with DiDoR is en par or better than the baselines'. Moreover, our approach neither increases the required memory capacity nor the time to compute an action, which may well be a point of failure for successfully deploying the learned controller.

Related papers

Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers. Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy. We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z)
Robust Visual Sim-to-Real Transfer for Robotic Manipulation [79.66851068682779]
Learning visuomotor policies in simulation is much safer and cheaper than in the real world. However, due to discrepancies between the simulated and real data, simulator-trained policies often fail when transferred to real robots. One common approach to bridge the visual sim-to-real domain gap is domain randomization (DR)
arXiv Detail & Related papers (2023-07-28T05:47:24Z)
Sim2real Transfer Learning for Point Cloud Segmentation: An Industrial Application Case on Autonomous Disassembly [55.41644538483948]
We present an industrial application case that uses sim2real transfer learning for point cloud data. We provide insights on how to generate and process synthetic point cloud data. A novel patch-based attention network is proposed additionally to tackle this problem.
arXiv Detail & Related papers (2023-01-12T14:00:37Z)
DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality [64.51295032956118]
We train a policy that can perform robust dexterous manipulation on an anthropomorphic robot hand. Our work reaffirms the possibilities of sim-to-real transfer for dexterous manipulation in diverse kinds of hardware and simulator setups.
arXiv Detail & Related papers (2022-10-25T01:51:36Z)
Towards Optimal Strategies for Training Self-Driving Perception Models in Simulation [98.51313127382937]
We focus on the use of labels in the synthetic domain alone. Our approach introduces both a way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator. We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data.
arXiv Detail & Related papers (2021-11-15T18:37:43Z)
Robot Learning from Randomized Simulations: A Review [59.992761565399185]
Deep learning has caused a paradigm shift in robotics research, favoring methods that require large amounts of data. State-of-the-art approaches learn in simulation where data generation is fast as well as inexpensive. We focus on a technique named 'domain randomization' which is a method for learning from randomized simulations.
arXiv Detail & Related papers (2021-11-01T13:55:41Z)
Sim2Sim Evaluation of a Novel Data-Efficient Differentiable Physics Engine for Tensegrity Robots [10.226310620727942]
Learning policies in simulation is promising for reducing human effort when training robot controllers. Sim2real gap is the main barrier to successfully transfer policies from simulation to a real robot. This work proposes a data-driven, end-to-end differentiable simulator.
arXiv Detail & Related papers (2020-11-10T06:19:54Z)
Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey [0.07366405857677225]
We cover the background behind sim-to-real transfer in deep reinforcement learning. We overview the main methods being utilized at the moment: domain randomization, domain adaptation, imitation learning, meta-learning and knowledge distillation.
arXiv Detail & Related papers (2020-09-24T21:05:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.