Related papers: Pre-training of Deep RL Agents for Improved Learning under Domain Randomization

Pre-training of Deep RL Agents for Improved Learning under Domain Randomization

URL: http://arxiv.org/abs/2104.14386v1
Date: Thu, 29 Apr 2021 14:54:11 GMT
Title: Pre-training of Deep RL Agents for Improved Learning under Domain Randomization
Authors: Artemij Amiranashvili, Max Argus, Lukas Hermann, Wolfram Burgard, Thomas Brox
Abstract summary: We show how to pre-train a perception encoder that already provides an embedding invariant to the randomization. We demonstrate this yields consistently improved results on a randomized version of DeepMind control suite tasks and a stacking environment on arbitrary backgrounds with zero-shot transfer to a physical robot.
Score: 63.09932240840656
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visual domain randomization in simulated environments is a widely used method to transfer policies trained in simulation to real robots. However, domain randomization and augmentation hamper the training of a policy. As reinforcement learning struggles with a noisy training signal, this additional nuisance can drastically impede training. For difficult tasks it can even result in complete failure to learn. To overcome this problem we propose to pre-train a perception encoder that already provides an embedding invariant to the randomization. We demonstrate that this yields consistently improved results on a randomized version of DeepMind control suite tasks and a stacking environment on arbitrary backgrounds with zero-shot transfer to a physical robot.

Related papers

Post-Convergence Sim-to-Real Policy Transfer: A Principled Alternative to Cherry-Picking [5.027571997864706]
This paper addresses the post-convergence sim-to-real transfer problem by introducing a worst-case performance transference optimization approach. Experiments demonstrate its effectiveness in transferring RL-based locomotion policies from simulation to real-world laboratory tests.
arXiv Detail & Related papers (2025-04-21T19:48:05Z)
Safe Continual Domain Adaptation after Sim2Real Transfer of Reinforcement Learning Policies in Robotics [3.7491742648742568]
Domain randomization is a technique to facilitate the transfer of policies from simulation to real-world robotic applications. We propose a method to enable safe deployment-time policy adaptation in real-world robot control.
arXiv Detail & Related papers (2025-03-13T23:28:11Z)
Flow-based Domain Randomization for Learning and Sequencing Robotic Skills [24.17247101490744]
Domain randomization in reinforcement learning is an established technique for increasing the robustness of control policies trained in simulation. In this paper we investigate automatically discovering a sampling distribution via entropy-regularized reward of a neural sampling distribution. We show that this architecture is more flexible than existing approaches that learn simpler, parameterized sampling distributions.
arXiv Detail & Related papers (2025-02-03T20:25:50Z)
Uncertainty-Aware Deployment of Pre-trained Language-Conditioned Imitation Learning Policies [29.00293625794431]
We propose a novel approach for uncertainty-aware deployment of pre-trained language-conditioned imitation learning agents. Specifically, we use temperature scaling to calibrate these models and exploit the calibrated model to make uncertainty-aware decisions. We implement our approach in simulation using three such pre-trained models, and showcase its potential to significantly enhance task completion rates.
arXiv Detail & Related papers (2024-03-27T03:19:36Z)
REBEL: Reward Regularization-Based Approach for Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and human preferences can lead to catastrophic outcomes in the real world. Recent methods aim to mitigate misalignment by learning reward functions from human preferences. We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z)
Domain Randomization for Robust, Affordable and Effective Closed-loop Control of Soft Robots [10.977130974626668]
Soft robots are gaining popularity thanks to their intrinsic safety to contacts and adaptability. We show how Domain Randomization (DR) can solve this problem by enhancing RL policies for soft robots. We introduce a novel algorithmic extension to previous adaptive domain randomization methods for the automatic inference of dynamics parameters for deformable objects.
arXiv Detail & Related papers (2023-03-07T18:50:00Z)
Continual Test-Time Domain Adaptation [94.51284735268597]
Test-time domain adaptation aims to adapt a source pre-trained model to a target domain without using any source data. CoTTA is easy to implement and can be readily incorporated in off-the-shelf pre-trained models.
arXiv Detail & Related papers (2022-03-25T11:42:02Z)
Safe Deep RL in 3D Environments using Human Feedback [15.038298345682556]
ReQueST aims to solve problem by learning a neural simulator of the environment from safe human trajectories. It is yet unknown whether this approach is feasible in complex 3D environments with feedback obtained from real humans. We show that the resulting agent exhibits an order of magnitude reduction in unsafe behaviour compared to standard reinforcement learning.
arXiv Detail & Related papers (2022-01-20T10:26:34Z)
Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments. We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance. Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z)
Robust Reinforcement Learning using Adversarial Populations [118.73193330231163]
Reinforcement Learning (RL) is an effective tool for controller design but can struggle with issues of robustness. We show that using a single adversary does not consistently yield robustness to dynamics variations under standard parametrizations of the adversary. We propose a population-based augmentation to the Robust RL formulation in which we randomly initialize a population of adversaries and sample from the population uniformly during training.
arXiv Detail & Related papers (2020-08-04T20:57:32Z)
Deep Reinforcement Learning amidst Lifelong Non-Stationarity [67.24635298387624]
We show that an off-policy RL algorithm can reason about and tackle lifelong non-stationarity. Our method leverages latent variable models to learn a representation of the environment from current and past experiences. We also introduce several simulation environments that exhibit lifelong non-stationarity, and empirically find that our approach substantially outperforms approaches that do not reason about environment shift.
arXiv Detail & Related papers (2020-06-18T17:34:50Z)
Online Constrained Model-based Reinforcement Learning [13.362455603441552]
Key requirement is the ability to handle continuous state and action spaces while remaining within a limited time and resource budget. We propose a model based approach that combines Gaussian Process regression and Receding Horizon Control. We test our approach on a cart pole swing-up environment and demonstrate the benefits of online learning on an autonomous racing task.
arXiv Detail & Related papers (2020-04-07T15:51:34Z)
Deep Adversarial Reinforcement Learning for Object Disentangling [36.66974848126079]
We present a novel adversarial reinforcement learning (ARL) framework for disentangling waste objects. The ARL framework utilizes an adversary, which is trained to steer the original agent, the protagonist, to challenging states. We show that our method can generalize from training to test scenarios by training an end-to-end system for robot control to solve a challenging object disentangling task.
arXiv Detail & Related papers (2020-03-08T13:20:39Z)
Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality [74.0084803220897]
Adversarial training is a popular method to give neural nets robustness against adversarial perturbations. We show convergence to low robust training loss for emphpolynomial width instead of exponential, under natural assumptions and with the ReLU activation.
arXiv Detail & Related papers (2020-02-16T20:13:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.