Related papers: Policy Transfer via Kinematic Domain Randomization and Adaptation

Policy Transfer via Kinematic Domain Randomization and Adaptation

URL: http://arxiv.org/abs/2011.01891v3
Date: Thu, 1 Apr 2021 19:47:39 GMT
Title: Policy Transfer via Kinematic Domain Randomization and Adaptation
Authors: Ioannis Exarchos, Yifeng Jiang, Wenhao Yu, C. Karen Liu
Abstract summary: We investigate the impact of randomized parameter selection on policy transferability across different types of domain discrepancies. We introduce a new domain adaptation algorithm that utilizes simulated kinematic parameters variation. We showcase our findings on a simulated quadruped robot in five different target environments.
Score: 22.038635244802798
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transferring reinforcement learning policies trained in physics simulation to the real hardware remains a challenge, known as the "sim-to-real" gap. Domain randomization is a simple yet effective technique to address dynamics discrepancies across source and target domains, but its success generally depends on heuristics and trial-and-error. In this work we investigate the impact of randomized parameter selection on policy transferability across different types of domain discrepancies. Contrary to common practice in which kinematic parameters are carefully measured while dynamic parameters are randomized, we found that virtually randomizing kinematic parameters (e.g., link lengths) during training in simulation generally outperforms dynamic randomization. Based on this finding, we introduce a new domain adaptation algorithm that utilizes simulated kinematic parameters variation. Our algorithm, Multi-Policy Bayesian Optimization, trains an ensemble of universal policies conditioned on virtual kinematic parameters and efficiently adapts to the target environment using a limited number of target domain rollouts. We showcase our findings on a simulated quadruped robot in five different target environments covering different aspects of domain discrepancies.

Related papers

Safe Continual Domain Adaptation after Sim2Real Transfer of Reinforcement Learning Policies in Robotics [3.7491742648742568]
Domain randomization is a technique to facilitate the transfer of policies from simulation to real-world robotic applications. We propose a method to enable safe deployment-time policy adaptation in real-world robot control.
arXiv Detail & Related papers (2025-03-13T23:28:11Z)
OMPO: A Unified Framework for RL under Policy and Dynamics Shifts [42.57662196581823]
Training reinforcement learning policies using environment interaction data collected from varying policies or dynamics presents a fundamental challenge. Existing works often overlook the distribution discrepancies induced by policy or dynamics shifts, or rely on specialized algorithms with task priors. In this paper, we identify a unified strategy for online RL policy learning under diverse settings of policy and dynamics shifts: transition occupancy matching.
arXiv Detail & Related papers (2024-05-29T13:36:36Z)
Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems. In common practice, convergence (hyper)policies are learned only to deploy their deterministic version. We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z)
BayRnTune: Adaptive Bayesian Domain Randomization via Strategic Fine-tuning [30.753772054098526]
Domain randomization (DR) entails training a policy with randomized dynamics. BayRnTune aims to significantly accelerate the learning processes by fine-tuning from previously learned policy.
arXiv Detail & Related papers (2023-10-16T17:32:23Z)
Robust Visual Sim-to-Real Transfer for Robotic Manipulation [79.66851068682779]
Learning visuomotor policies in simulation is much safer and cheaper than in the real world. However, due to discrepancies between the simulated and real data, simulator-trained policies often fail when transferred to real robots. One common approach to bridge the visual sim-to-real domain gap is domain randomization (DR)
arXiv Detail & Related papers (2023-07-28T05:47:24Z)
Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces. We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories. We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z)
Cross-Domain Policy Adaptation via Value-Guided Data Filtering [57.62692881606099]
Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning. We present the Value-Guided Data Filtering (VGDF) algorithm, which selectively shares transitions from the source domain based on the proximity of paired value targets.
arXiv Detail & Related papers (2023-05-28T04:08:40Z)
Nonprehensile Riemannian Motion Predictive Control [57.295751294224765]
We introduce a novel Real-to-Sim reward analysis technique to reliably imagine and predict the outcome of taking possible actions for a real robotic platform. We produce a closed-loop controller to reactively push objects in a continuous action space. We observe that RMPC is robust in cluttered as well as occluded environments and outperforms the baselines.
arXiv Detail & Related papers (2021-11-15T18:50:04Z)
Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well. We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain. We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z)
Data-efficient Domain Randomization with Bayesian Optimization [34.854609756970305]
When learning policies for robot control, the required real-world data is typically prohibitively expensive to acquire. BayRn is a black-box sim-to-real algorithm that solves tasks efficiently by adapting the domain parameter distribution. Our results show that BayRn is able to perform sim-to-real transfer, while significantly reducing the required prior knowledge.
arXiv Detail & Related papers (2020-03-05T07:48:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.