Uncertainty Aware System Identification with Universal Policies
- URL: http://arxiv.org/abs/2202.05844v1
- Date: Fri, 11 Feb 2022 18:27:23 GMT
- Title: Uncertainty Aware System Identification with Universal Policies
- Authors: Buddhika Laknath Semage, Thommen George Karimpanal, Santu Rana and
Svetha Venkatesh
- Abstract summary: Sim2real transfer is concerned with transferring policies trained in simulation to potentially noisy real world environments.
We propose Uncertainty-aware policy search (UncAPS), where we use Universal Policy Network (UPN) to store simulation-trained task-specific policies.
We then employ robust Bayesian optimisation to craft robust policies for the given environment by combining relevant UPN policies in a DR like fashion.
- Score: 45.44896435487879
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sim2real transfer is primarily concerned with transferring policies trained
in simulation to potentially noisy real world environments. A common problem
associated with sim2real transfer is estimating the real-world environmental
parameters to ground the simulated environment to. Although existing methods
such as Domain Randomisation (DR) can produce robust policies by sampling from
a distribution of parameters during training, there is no established method
for identifying the parameters of the corresponding distribution for a given
real-world setting. In this work, we propose Uncertainty-aware policy search
(UncAPS), where we use Universal Policy Network (UPN) to store
simulation-trained task-specific policies across the full range of
environmental parameters and then subsequently employ robust Bayesian
optimisation to craft robust policies for the given environment by combining
relevant UPN policies in a DR like fashion. Such policy-driven grounding is
expected to be more efficient as it estimates only task-relevant sets of
parameters. Further, we also account for the estimation uncertainties in the
search process to produce policies that are robust against both aleatoric and
epistemic uncertainties. We empirically evaluate our approach in a range of
noisy, continuous control environments, and show its improved performance
compared to competing baselines.
Related papers
- Certifiably Robust Policies for Uncertain Parametric Environments [57.2416302384766]
We propose a framework based on parametric Markov decision processes (MDPs) with unknown distributions over parameters.
We learn and analyse IMDPs for a set of unknown sample environments induced by parameters.
We show that our approach produces tight bounds on a policy's performance with high confidence.
arXiv Detail & Related papers (2024-08-06T10:48:15Z) - Evaluating Real-World Robot Manipulation Policies in Simulation [91.55267186958892]
Control and visual disparities between real and simulated environments are key challenges for reliable simulated evaluation.
We propose approaches for mitigating these gaps without needing to craft full-fidelity digital twins of real-world environments.
We create SIMPLER, a collection of simulated environments for manipulation policy evaluation on common real robot setups.
arXiv Detail & Related papers (2024-05-09T17:30:16Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - $K$-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic
Control [0.6906005491572401]
We propose a novel $K$-nearest neighbor reparametric procedure for estimating the performance of a policy from historical data.
Our analysis allows for the sampling of entire episodes, as is common practice in most applications.
Compared to other OPE methods, our algorithm does not require optimization, can be efficiently implemented via tree-based nearest neighbor search and parallelization, and does not explicitly assume a parametric model for the environment's dynamics.
arXiv Detail & Related papers (2023-06-07T23:55:12Z) - Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness
to Model Misspecification [22.241676350331968]
This study focuses on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values.
The aim is to optimize the worst-case performance on the uncertainty parameter set to guarantee the performance in the corresponding real-world environment.
Experiments in multi-joint dynamics with contact (MuJoCo) environments show that the proposed method exhibited a worst-case performance superior to several baseline approaches.
arXiv Detail & Related papers (2022-11-07T10:18:31Z) - Fast Model-based Policy Search for Universal Policy Networks [45.44896435487879]
Adapting an agent's behaviour to new environments has been one of the primary focus areas of physics based reinforcement learning.
We propose a Gaussian Process-based prior learned in simulation, that captures the likely performance of a policy when transferred to a previously unseen environment.
We integrate this prior with a Bayesian optimisation-based policy search process to improve the efficiency of identifying the most appropriate policy from the universal policy network.
arXiv Detail & Related papers (2022-02-11T18:08:02Z) - Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety
Constraints in Finite MDPs [71.47895794305883]
We study the problem of Safe Policy Improvement (SPI) under constraints in the offline Reinforcement Learning setting.
We present an SPI for this RL setting that takes into account the preferences of the algorithm's user for handling the trade-offs for different reward signals.
arXiv Detail & Related papers (2021-05-31T21:04:21Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.