Related papers: Validate on Sim, Detect on Real -- Model Selection for Domain Randomization

Validate on Sim, Detect on Real -- Model Selection for Domain Randomization

URL: http://arxiv.org/abs/2111.00765v1
Date: Mon, 1 Nov 2021 08:34:59 GMT
Title: Validate on Sim, Detect on Real -- Model Selection for Domain Randomization
Authors: Gal Leibovich, Guy Jacob, Shadi Endrawis, Gal Novik, Aviv Tamar
Abstract summary: A practical approach to learning robot skills, often termed sim2real, is to train control policies in simulation and then deploy them on a real robot. We show that our score - VSDR - can significantly improve the accuracy of policy ranking without requiring additional real world data.
Score: 17.461103383630853
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A practical approach to learning robot skills, often termed sim2real, is to train control policies in simulation and then deploy them on a real robot. Popular techniques to improve the sim2real transfer build on domain randomization (DR): Training the policy on a diverse set of randomly generated domains with the hope of better generalization to the real world. Due to the large number of hyper-parameters in both the policy learning and DR algorithms, one often ends up with a large number of trained models, where choosing the best model among them demands costly evaluation on the real robot. In this work we ask: Can we rank the policies without running them in the real world? Our main idea is that a predefined set of real world data can be used to evaluate all policies, using out-of-distribution detection (OOD) techniques. In a sense, this approach can be seen as a "unit test" to evaluate policies before any real world execution. However, we find that by itself, the OOD score can be inaccurate and very sensitive to the particular OOD method. Our main contribution is a simple-yet-effective policy score that combines OOD with an evaluation in simulation. We show that our score - VSDR - can significantly improve the accuracy of policy ranking without requiring additional real world data. We evaluate the effectiveness of VSDR on sim2real transfer in a robotic grasping task with image inputs. We extensively evaluate different DR parameters and OOD methods, and show that VSDR improves policy selection across the board. More importantly, our method achieves significantly better ranking, and uses significantly less data compared to baselines.

Related papers

Evaluating Real-World Robot Manipulation Policies in Simulation [91.55267186958892]
Control and visual disparities between real and simulated environments are key challenges for reliable simulated evaluation. We propose approaches for mitigating these gaps without needing to craft full-fidelity digital twins of real-world environments. We create SIMPLER, a collection of simulated environments for manipulation policy evaluation on common real robot setups.
arXiv Detail & Related papers (2024-05-09T17:30:16Z)
Robust Visual Sim-to-Real Transfer for Robotic Manipulation [79.66851068682779]
Learning visuomotor policies in simulation is much safer and cheaper than in the real world. However, due to discrepancies between the simulated and real data, simulator-trained policies often fail when transferred to real robots. One common approach to bridge the visual sim-to-real domain gap is domain randomization (DR)
arXiv Detail & Related papers (2023-07-28T05:47:24Z)
Verifying Learning-Based Robotic Navigation Systems [61.01217374879221]
We show how modern verification engines can be used for effective model selection. Specifically, we use verification to detect and rule out policies that may demonstrate suboptimal behavior. Our work is the first to demonstrate the use of verification backends for recognizing suboptimal DRL policies in real-world robots.
arXiv Detail & Related papers (2022-05-26T17:56:43Z)
Off Environment Evaluation Using Convex Risk Minimization [0.0]
We propose a convex risk minimization algorithm to estimate the model mismatch between the simulator and the target domain. We show that this estimator can be used along with the simulator to evaluate performance of an RL agents in the target domain.
arXiv Detail & Related papers (2021-12-21T21:31:54Z)
Distilled Domain Randomization [23.178141671320436]
We propose to combine reinforcement learning from randomized physics simulations with policy distillation. Our algorithm, called Distilled Domain Randomization (DiDoR), distills so-called teacher policies, which are experts on domains. This way, DiDoR learns controllers which transfer directly from simulation to reality, without requiring data from the target domain.
arXiv Detail & Related papers (2021-12-06T16:35:08Z)
Supervised Off-Policy Ranking [145.3039527243585]
Off-policy evaluation (OPE) leverages data generated by other policies to evaluate a target policy. We propose supervised off-policy ranking that learns a policy scoring model by correctly ranking training policies with known performance. Our method outperforms strong baseline OPE methods in terms of both rank correlation and performance gap between the truly best and the best of the ranked top three policies.
arXiv Detail & Related papers (2021-07-03T07:01:23Z)
A Framework for Efficient Robotic Manipulation [79.10407063260473]
We show that a single robotic arm can learn sparse-reward manipulation policies from pixels. We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels.
arXiv Detail & Related papers (2020-12-14T22:18:39Z)
Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z)
Robotic Arm Control and Task Training through Deep Reinforcement Learning [6.249276977046449]
We show that Trust Region Policy Optimization and DeepQ-Network with Normalized Advantage Functions perform better than Deep Deterministic Policy Gradient and Vanilla Policy Gradient. Real-world experiments let show that our polices, if correctly trained on simulation, can be transferred and executed in a real environment with almost no changes.
arXiv Detail & Related papers (2020-05-06T07:34:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.