Off Environment Evaluation Using Convex Risk Minimization
- URL: http://arxiv.org/abs/2112.11532v1
- Date: Tue, 21 Dec 2021 21:31:54 GMT
- Title: Off Environment Evaluation Using Convex Risk Minimization
- Authors: Pulkit Katdare, Shuijing Liu and Katherine Driggs-Campbell
- Abstract summary: We propose a convex risk minimization algorithm to estimate the model mismatch between the simulator and the target domain.
We show that this estimator can be used along with the simulator to evaluate performance of an RL agents in the target domain.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Applying reinforcement learning (RL) methods on robots typically involves
training a policy in simulation and deploying it on a robot in the real world.
Because of the model mismatch between the real world and the simulator, RL
agents deployed in this manner tend to perform suboptimally. To tackle this
problem, researchers have developed robust policy learning algorithms that rely
on synthetic noise disturbances. However, such methods do not guarantee
performance in the target environment. We propose a convex risk minimization
algorithm to estimate the model mismatch between the simulator and the target
domain using trajectory data from both environments. We show that this
estimator can be used along with the simulator to evaluate performance of an RL
agents in the target domain, effectively bridging the gap between these two
environments. We also show that the convergence rate of our estimator to be of
the order of ${n^{-1/4}}$, where $n$ is the number of training samples. In
simulation, we demonstrate how our method effectively approximates and
evaluates performance on Gridworld, Cartpole, and Reacher environments on a
range of policies. We also show that the our method is able to estimate
performance of a 7 DOF robotic arm using the simulator and remotely collected
data from the robot in the real world.
Related papers
- Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - Evaluating Real-World Robot Manipulation Policies in Simulation [91.55267186958892]
Control and visual disparities between real and simulated environments are key challenges for reliable simulated evaluation.
We propose approaches for mitigating these gaps without needing to craft full-fidelity digital twins of real-world environments.
We create SIMPLER, a collection of simulated environments for manipulation policy evaluation on common real robot setups.
arXiv Detail & Related papers (2024-05-09T17:30:16Z) - Marginalized Importance Sampling for Off-Environment Policy Evaluation [13.824507564510503]
Reinforcement Learning (RL) methods are typically sample-inefficient, making it challenging to train and deploy RL-policies in real world robots.
This paper proposes a new approach to evaluate the real-world performance of agent policies prior to deploying them in the real world.
Our approach incorporates a simulator along with real-world offline data to evaluate the performance of any policy.
arXiv Detail & Related papers (2023-09-04T20:52:04Z) - Robust Visual Sim-to-Real Transfer for Robotic Manipulation [79.66851068682779]
Learning visuomotor policies in simulation is much safer and cheaper than in the real world.
However, due to discrepancies between the simulated and real data, simulator-trained policies often fail when transferred to real robots.
One common approach to bridge the visual sim-to-real domain gap is domain randomization (DR)
arXiv Detail & Related papers (2023-07-28T05:47:24Z) - Obstacle Avoidance for Robotic Manipulator in Joint Space via Improved
Proximal Policy Optimization [6.067589886362815]
In this paper, we train a deep neural network via an improved Proximal Policy Optimization (PPO) algorithm to map from task space to joint space for a 6-DoF manipulator.
Since training such a task in real-robot is time-consuming and strenuous, we develop a simulation environment to train the model.
Experimental results showed that using our method, the robot was capable of tracking a single target or reaching multiple targets in unstructured environments.
arXiv Detail & Related papers (2022-10-03T10:21:57Z) - Nonprehensile Riemannian Motion Predictive Control [57.295751294224765]
We introduce a novel Real-to-Sim reward analysis technique to reliably imagine and predict the outcome of taking possible actions for a real robotic platform.
We produce a closed-loop controller to reactively push objects in a continuous action space.
We observe that RMPC is robust in cluttered as well as occluded environments and outperforms the baselines.
arXiv Detail & Related papers (2021-11-15T18:50:04Z) - Accelerated Sim-to-Real Deep Reinforcement Learning: Learning Collision
Avoidance from Human Player [5.960346570280513]
This paper presents a sensor-level mapless collision avoidance algorithm for use in mobile robots.
An efficient training strategy is proposed to allow a robot to learn from both human experience data and self-exploratory data.
A game format simulation framework is designed to allow the human player to tele-operate the mobile robot to a goal.
arXiv Detail & Related papers (2021-02-21T23:27:34Z) - A User's Guide to Calibrating Robotics Simulators [54.85241102329546]
This paper proposes a set of benchmarks and a framework for the study of various algorithms aimed to transfer models and policies learnt in simulation to the real world.
We conduct experiments on a wide range of well known simulated environments to characterize and offer insights into the performance of different algorithms.
Our analysis can be useful for practitioners working in this area and can help make informed choices about the behavior and main properties of sim-to-real algorithms.
arXiv Detail & Related papers (2020-11-17T22:24:26Z) - Guided Uncertainty-Aware Policy Optimization: Combining Learning and
Model-Based Strategies for Sample-Efficient Policy Learning [75.56839075060819]
Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state.
reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle.
In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline.
arXiv Detail & Related papers (2020-05-21T19:47:05Z) - Sim-to-Real Transfer with Incremental Environment Complexity for
Reinforcement Learning of Depth-Based Robot Navigation [1.290382979353427]
Soft-Actor Critic (SAC) training strategy using incremental environment complexity is proposed to drastically reduce the need for additional training in the real world.
The application addressed is depth-based mapless navigation, where a mobile robot should reach a given waypoint in a cluttered environment with no prior mapping information.
arXiv Detail & Related papers (2020-04-30T10:47:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.