Policy Transfer via Kinematic Domain Randomization and Adaptation
- URL: http://arxiv.org/abs/2011.01891v3
- Date: Thu, 1 Apr 2021 19:47:39 GMT
- Title: Policy Transfer via Kinematic Domain Randomization and Adaptation
- Authors: Ioannis Exarchos, Yifeng Jiang, Wenhao Yu, C. Karen Liu
- Abstract summary: We investigate the impact of randomized parameter selection on policy transferability across different types of domain discrepancies.
We introduce a new domain adaptation algorithm that utilizes simulated kinematic parameters variation.
We showcase our findings on a simulated quadruped robot in five different target environments.
- Score: 22.038635244802798
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transferring reinforcement learning policies trained in physics simulation to
the real hardware remains a challenge, known as the "sim-to-real" gap. Domain
randomization is a simple yet effective technique to address dynamics
discrepancies across source and target domains, but its success generally
depends on heuristics and trial-and-error. In this work we investigate the
impact of randomized parameter selection on policy transferability across
different types of domain discrepancies. Contrary to common practice in which
kinematic parameters are carefully measured while dynamic parameters are
randomized, we found that virtually randomizing kinematic parameters (e.g.,
link lengths) during training in simulation generally outperforms dynamic
randomization. Based on this finding, we introduce a new domain adaptation
algorithm that utilizes simulated kinematic parameters variation. Our
algorithm, Multi-Policy Bayesian Optimization, trains an ensemble of universal
policies conditioned on virtual kinematic parameters and efficiently adapts to
the target environment using a limited number of target domain rollouts. We
showcase our findings on a simulated quadruped robot in five different target
environments covering different aspects of domain discrepancies.
Related papers
- OMPO: A Unified Framework for RL under Policy and Dynamics Shifts [42.57662196581823]
Training reinforcement learning policies using environment interaction data collected from varying policies or dynamics presents a fundamental challenge.
Existing works often overlook the distribution discrepancies induced by policy or dynamics shifts, or rely on specialized algorithms with task priors.
In this paper, we identify a unified strategy for online RL policy learning under diverse settings of policy and dynamics shifts: transition occupancy matching.
arXiv Detail & Related papers (2024-05-29T13:36:36Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - BayRnTune: Adaptive Bayesian Domain Randomization via Strategic
Fine-tuning [30.753772054098526]
Domain randomization (DR) entails training a policy with randomized dynamics.
BayRnTune aims to significantly accelerate the learning processes by fine-tuning from previously learned policy.
arXiv Detail & Related papers (2023-10-16T17:32:23Z) - Robust Visual Sim-to-Real Transfer for Robotic Manipulation [79.66851068682779]
Learning visuomotor policies in simulation is much safer and cheaper than in the real world.
However, due to discrepancies between the simulated and real data, simulator-trained policies often fail when transferred to real robots.
One common approach to bridge the visual sim-to-real domain gap is domain randomization (DR)
arXiv Detail & Related papers (2023-07-28T05:47:24Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Cross-Domain Policy Adaptation via Value-Guided Data Filtering [57.62692881606099]
Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning.
We present the Value-Guided Data Filtering (VGDF) algorithm, which selectively shares transitions from the source domain based on the proximity of paired value targets.
arXiv Detail & Related papers (2023-05-28T04:08:40Z) - Nonprehensile Riemannian Motion Predictive Control [57.295751294224765]
We introduce a novel Real-to-Sim reward analysis technique to reliably imagine and predict the outcome of taking possible actions for a real robotic platform.
We produce a closed-loop controller to reactively push objects in a continuous action space.
We observe that RMPC is robust in cluttered as well as occluded environments and outperforms the baselines.
arXiv Detail & Related papers (2021-11-15T18:50:04Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - Data-efficient Domain Randomization with Bayesian Optimization [34.854609756970305]
When learning policies for robot control, the required real-world data is typically prohibitively expensive to acquire.
BayRn is a black-box sim-to-real algorithm that solves tasks efficiently by adapting the domain parameter distribution.
Our results show that BayRn is able to perform sim-to-real transfer, while significantly reducing the required prior knowledge.
arXiv Detail & Related papers (2020-03-05T07:48:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.