Pre-training of Deep RL Agents for Improved Learning under Domain
Randomization
- URL: http://arxiv.org/abs/2104.14386v1
- Date: Thu, 29 Apr 2021 14:54:11 GMT
- Title: Pre-training of Deep RL Agents for Improved Learning under Domain
Randomization
- Authors: Artemij Amiranashvili, Max Argus, Lukas Hermann, Wolfram Burgard,
Thomas Brox
- Abstract summary: We show how to pre-train a perception encoder that already provides an embedding invariant to the randomization.
We demonstrate this yields consistently improved results on a randomized version of DeepMind control suite tasks and a stacking environment on arbitrary backgrounds with zero-shot transfer to a physical robot.
- Score: 63.09932240840656
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual domain randomization in simulated environments is a widely used method
to transfer policies trained in simulation to real robots. However, domain
randomization and augmentation hamper the training of a policy. As
reinforcement learning struggles with a noisy training signal, this additional
nuisance can drastically impede training. For difficult tasks it can even
result in complete failure to learn. To overcome this problem we propose to
pre-train a perception encoder that already provides an embedding invariant to
the randomization. We demonstrate that this yields consistently improved
results on a randomized version of DeepMind control suite tasks and a stacking
environment on arbitrary backgrounds with zero-shot transfer to a physical
robot.
Related papers
- Flow-based Domain Randomization for Learning and Sequencing Robotic Skills [24.17247101490744]
Domain randomization in reinforcement learning is an established technique for increasing the robustness of control policies trained in simulation.
In this paper we investigate automatically discovering a sampling distribution via entropy-regularized reward of a neural sampling distribution.
We show that this architecture is more flexible than existing approaches that learn simpler, parameterized sampling distributions.
arXiv Detail & Related papers (2025-02-03T20:25:50Z) - Uncertainty-Aware Deployment of Pre-trained Language-Conditioned Imitation Learning Policies [29.00293625794431]
We propose a novel approach for uncertainty-aware deployment of pre-trained language-conditioned imitation learning agents.
Specifically, we use temperature scaling to calibrate these models and exploit the calibrated model to make uncertainty-aware decisions.
We implement our approach in simulation using three such pre-trained models, and showcase its potential to significantly enhance task completion rates.
arXiv Detail & Related papers (2024-03-27T03:19:36Z) - REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world.
Recent methods aim to mitigate misalignment by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Domain Randomization for Robust, Affordable and Effective Closed-loop
Control of Soft Robots [10.977130974626668]
Soft robots are gaining popularity thanks to their intrinsic safety to contacts and adaptability.
We show how Domain Randomization (DR) can solve this problem by enhancing RL policies for soft robots.
We introduce a novel algorithmic extension to previous adaptive domain randomization methods for the automatic inference of dynamics parameters for deformable objects.
arXiv Detail & Related papers (2023-03-07T18:50:00Z) - Continual Test-Time Domain Adaptation [94.51284735268597]
Test-time domain adaptation aims to adapt a source pre-trained model to a target domain without using any source data.
CoTTA is easy to implement and can be readily incorporated in off-the-shelf pre-trained models.
arXiv Detail & Related papers (2022-03-25T11:42:02Z) - Learning Robust Policy against Disturbance in Transition Dynamics via
State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments.
We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance.
Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z) - Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity.
Our method leverages latent variable models to learn a representation of the environment from current and past experiences.
We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z) - Online Constrained Model-based Reinforcement Learning [13.362455603441552]
Key requirement is the ability to handle continuous state and action spaces while remaining within a limited time and resource budget.
We propose a model based approach that combines Gaussian Process regression and Receding Horizon Control.
We test our approach on a cart pole swing-up environment and demonstrate the benefits of online learning on an autonomous racing task.
arXiv Detail & Related papers (2020-04-07T15:51:34Z) - Deep Adversarial Reinforcement Learning for Object Disentangling [36.66974848126079]
We present a novel adversarial reinforcement learning (ARL) framework for disentangling waste objects.
The ARL framework utilizes an adversary, which is trained to steer the original agent, the protagonist, to challenging states.
We show that our method can generalize from training to test scenarios by training an end-to-end system for robot control to solve a challenging object disentangling task.
arXiv Detail & Related papers (2020-03-08T13:20:39Z) - Over-parameterized Adversarial Training: An Analysis Overcoming the
Curse of Dimensionality [74.0084803220897]
Adversarial training is a popular method to give neural nets robustness against adversarial perturbations.
We show convergence to low robust training loss for emphpolynomial width instead of exponential, under natural assumptions and with the ReLU activation.
arXiv Detail & Related papers (2020-02-16T20:13:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.