Pre-training of Deep RL Agents for Improved Learning under Domain
Randomization
- URL: http://arxiv.org/abs/2104.14386v1
- Date: Thu, 29 Apr 2021 14:54:11 GMT
- Title: Pre-training of Deep RL Agents for Improved Learning under Domain
Randomization
- Authors: Artemij Amiranashvili, Max Argus, Lukas Hermann, Wolfram Burgard,
Thomas Brox
- Abstract summary: We show how to pre-train a perception encoder that already provides an embedding invariant to the randomization.
We demonstrate this yields consistently improved results on a randomized version of DeepMind control suite tasks and a stacking environment on arbitrary backgrounds with zero-shot transfer to a physical robot.
- Score: 63.09932240840656
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual domain randomization in simulated environments is a widely used method
to transfer policies trained in simulation to real robots. However, domain
randomization and augmentation hamper the training of a policy. As
reinforcement learning struggles with a noisy training signal, this additional
nuisance can drastically impede training. For difficult tasks it can even
result in complete failure to learn. To overcome this problem we propose to
pre-train a perception encoder that already provides an embedding invariant to
the randomization. We demonstrate that this yields consistently improved
results on a randomized version of DeepMind control suite tasks and a stacking
environment on arbitrary backgrounds with zero-shot transfer to a physical
robot.
Related papers
- Uncertainty-Aware Deployment of Pre-trained Language-Conditioned Imitation Learning Policies [29.00293625794431]
We propose a novel approach for uncertainty-aware deployment of pre-trained language-conditioned imitation learning agents.
Specifically, we use temperature scaling to calibrate these models and exploit the calibrated model to make uncertainty-aware decisions.
We implement our approach in simulation using three such pre-trained models, and showcase its potential to significantly enhance task completion rates.
arXiv Detail & Related papers (2024-03-27T03:19:36Z) - Domain Randomization for Robust, Affordable and Effective Closed-loop
Control of Soft Robots [10.977130974626668]
Soft robots are gaining popularity thanks to their intrinsic safety to contacts and adaptability.
We show how Domain Randomization (DR) can solve this problem by enhancing RL policies for soft robots.
We introduce a novel algorithmic extension to previous adaptive domain randomization methods for the automatic inference of dynamics parameters for deformable objects.
arXiv Detail & Related papers (2023-03-07T18:50:00Z) - Continual Test-Time Domain Adaptation [94.51284735268597]
Test-time domain adaptation aims to adapt a source pre-trained model to a target domain without using any source data.
CoTTA is easy to implement and can be readily incorporated in off-the-shelf pre-trained models.
arXiv Detail & Related papers (2022-03-25T11:42:02Z) - Safe Deep RL in 3D Environments using Human Feedback [15.038298345682556]
ReQueST aims to solve problem by learning a neural simulator of the environment from safe human trajectories.
It is yet unknown whether this approach is feasible in complex 3D environments with feedback obtained from real humans.
We show that the resulting agent exhibits an order of magnitude reduction in unsafe behaviour compared to standard reinforcement learning.
arXiv Detail & Related papers (2022-01-20T10:26:34Z) - Learning Robust Policy against Disturbance in Transition Dynamics via
State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments.
We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance.
Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z) - Robust Reinforcement Learning using Adversarial Populations [118.73193330231163]
Reinforcement Learning (RL) is an effective tool for controller design but can struggle with issues of robustness.
We show that using a single adversary does not consistently yield robustness to dynamics variations under standard parametrizations of the adversary.
We propose a population-based augmentation to the Robust RL formulation in which we randomly initialize a population of adversaries and sample from the population uniformly during training.
arXiv Detail & Related papers (2020-08-04T20:57:32Z) - Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z) - Online Constrained Model-based Reinforcement Learning [13.362455603441552]
Key requirement is the ability to handle continuous state and action spaces while remaining within a limited time and resource budget.
We propose a model based approach that combines Gaussian Process regression and Receding Horizon Control.
We test our approach on a cart pole swing-up environment and demonstrate the benefits of online learning on an autonomous racing task.
arXiv Detail & Related papers (2020-04-07T15:51:34Z) - Deep Adversarial Reinforcement Learning for Object Disentangling [36.66974848126079]
We present a novel adversarial reinforcement learning (ARL) framework for disentangling waste objects.
The ARL framework utilizes an adversary, which is trained to steer the original agent, the protagonist, to challenging states.
We show that our method can generalize from training to test scenarios by training an end-to-end system for robot control to solve a challenging object disentangling task.
arXiv Detail & Related papers (2020-03-08T13:20:39Z) - Over-parameterized Adversarial Training: An Analysis Overcoming the
Curse of Dimensionality [74.0084803220897]
Adversarial training is a popular method to give neural nets robustness against adversarial perturbations.
We show convergence to low robust training loss for emphpolynomial width instead of exponential, under natural assumptions and with the ReLU activation.
arXiv Detail & Related papers (2020-02-16T20:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.