Marginalized Importance Sampling for Off-Environment Policy Evaluation
- URL: http://arxiv.org/abs/2309.01807v2
- Date: Wed, 4 Oct 2023 20:17:46 GMT
- Title: Marginalized Importance Sampling for Off-Environment Policy Evaluation
- Authors: Pulkit Katdare, Nan Jiang and Katherine Driggs-Campbell
- Abstract summary: Reinforcement Learning (RL) methods are typically sample-inefficient, making it challenging to train and deploy RL-policies in real world robots.
This paper proposes a new approach to evaluate the real-world performance of agent policies prior to deploying them in the real world.
Our approach incorporates a simulator along with real-world offline data to evaluate the performance of any policy.
- Score: 13.824507564510503
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Reinforcement Learning (RL) methods are typically sample-inefficient, making
it challenging to train and deploy RL-policies in real world robots. Even a
robust policy trained in simulation requires a real-world deployment to assess
their performance. This paper proposes a new approach to evaluate the
real-world performance of agent policies prior to deploying them in the real
world. Our approach incorporates a simulator along with real-world offline data
to evaluate the performance of any policy using the framework of Marginalized
Importance Sampling (MIS). Existing MIS methods face two challenges: (1) large
density ratios that deviate from a reasonable range and (2) indirect
supervision, where the ratio needs to be inferred indirectly, thus exacerbating
estimation error. Our approach addresses these challenges by introducing the
target policy's occupancy in the simulator as an intermediate variable and
learning the density ratio as the product of two terms that can be learned
separately. The first term is learned with direct supervision and the second
term has a small magnitude, thus making it computationally efficient. We
analyze the sample complexity as well as error propagation of our two
step-procedure. Furthermore, we empirically evaluate our approach on Sim2Sim
environments such as Cartpole, Reacher, and Half-Cheetah. Our results show that
our method generalizes well across a variety of Sim2Sim gap, target policies
and offline data collection policies. We also demonstrate the performance of
our algorithm on a Sim2Real task of validating the performance of a 7 DoF
robotic arm using offline data along with the Gazebo simulator.
Related papers
- Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques [65.55451717632317]
We study Multi-Agent Reinforcement Learning from Human Feedback (MARLHF), exploring both theoretical foundations and empirical validations.
We define the task as identifying Nash equilibrium from a preference-only offline dataset in general-sum games.
Our findings underscore the multifaceted approach required for MARLHF, paving the way for effective preference-based multi-agent systems.
arXiv Detail & Related papers (2024-09-01T13:14:41Z) - Evaluating Real-World Robot Manipulation Policies in Simulation [91.55267186958892]
Control and visual disparities between real and simulated environments are key challenges for reliable simulated evaluation.
We propose approaches for mitigating these gaps without needing to craft full-fidelity digital twins of real-world environments.
We create SIMPLER, a collection of simulated environments for manipulation policy evaluation on common real robot setups.
arXiv Detail & Related papers (2024-05-09T17:30:16Z) - Robust Visual Sim-to-Real Transfer for Robotic Manipulation [79.66851068682779]
Learning visuomotor policies in simulation is much safer and cheaper than in the real world.
However, due to discrepancies between the simulated and real data, simulator-trained policies often fail when transferred to real robots.
One common approach to bridge the visual sim-to-real domain gap is domain randomization (DR)
arXiv Detail & Related papers (2023-07-28T05:47:24Z) - Bridging the Reality Gap of Reinforcement Learning based Traffic Signal
Control using Domain Randomization and Meta Learning [0.7614628596146599]
We present a comprehensive analysis of potential simulation parameters that contribute to this reality gap.
We then examine two promising strategies that can bridge this gap: Domain Randomization (DR) and Model-Agnostic Meta-Learning (MAML)
Our experimental results show that both DR and MAML outperform a state-of-the-art RL algorithm.
arXiv Detail & Related papers (2023-07-21T05:17:21Z) - Maximize to Explore: One Objective Function Fusing Estimation, Planning,
and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX.
textttMEX integrates estimation and planning components while balancing exploration exploitation automatically.
It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z) - One-Shot Domain Adaptive and Generalizable Semantic Segmentation with
Class-Aware Cross-Domain Transformers [96.51828911883456]
Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data.
Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for the adaptation.
We explore the one-shot unsupervised sim-to-real domain adaptation (OSUDA) and generalization problem, where only one real-world data sample is available.
arXiv Detail & Related papers (2022-12-14T15:54:15Z) - Uncertainty Aware System Identification with Universal Policies [45.44896435487879]
Sim2real transfer is concerned with transferring policies trained in simulation to potentially noisy real world environments.
We propose Uncertainty-aware policy search (UncAPS), where we use Universal Policy Network (UPN) to store simulation-trained task-specific policies.
We then employ robust Bayesian optimisation to craft robust policies for the given environment by combining relevant UPN policies in a DR like fashion.
arXiv Detail & Related papers (2022-02-11T18:27:23Z) - Off Environment Evaluation Using Convex Risk Minimization [0.0]
We propose a convex risk minimization algorithm to estimate the model mismatch between the simulator and the target domain.
We show that this estimator can be used along with the simulator to evaluate performance of an RL agents in the target domain.
arXiv Detail & Related papers (2021-12-21T21:31:54Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Sim-to-Real Transfer with Incremental Environment Complexity for
Reinforcement Learning of Depth-Based Robot Navigation [1.290382979353427]
Soft-Actor Critic (SAC) training strategy using incremental environment complexity is proposed to drastically reduce the need for additional training in the real world.
The application addressed is depth-based mapless navigation, where a mobile robot should reach a given waypoint in a cluttered environment with no prior mapping information.
arXiv Detail & Related papers (2020-04-30T10:47:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.