Efficient Embedding of Semantic Similarity in Control Policies via
Entangled Bisimulation
- URL: http://arxiv.org/abs/2201.12300v1
- Date: Fri, 28 Jan 2022 18:06:06 GMT
- Title: Efficient Embedding of Semantic Similarity in Control Policies via
Entangled Bisimulation
- Authors: Martin Bertran, Walter Talbott, Nitish Srivastava, Joshua Susskind
- Abstract summary: Learning generalizeable policies from visual input in the presence of visual distractions is a challenging problem in reinforcement learning.
We propose entangled bisimulation, a bisimulation metric that allows the specification of the distance function between states.
We show how entangled bisimulation can meaningfully improve over previous methods on the Distracting Control Suite (DCS)
- Score: 3.5092955099876266
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning generalizeable policies from visual input in the presence of visual
distractions is a challenging problem in reinforcement learning. Recently,
there has been renewed interest in bisimulation metrics as a tool to address
this issue; these metrics can be used to learn representations that are, in
principle, invariant to irrelevant distractions by measuring behavioural
similarity between states. An accurate, unbiased, and scalable estimation of
these metrics has proved elusive in continuous state and action scenarios. We
propose entangled bisimulation, a bisimulation metric that allows the
specification of the distance function between states, and can be estimated
without bias in continuous state and action spaces. We show how entangled
bisimulation can meaningfully improve over previous methods on the Distracting
Control Suite (DCS), even when added on top of data augmentation techniques.
Related papers
- Learning Action-based Representations Using Invariance [18.1941237781348]
We introduce action-bisimulation encoding, which learns a multi-step controllability metric that discounts distant state features that are relevant for control.
We demonstrate that action-bisimulation pretraining on reward-free, uniformly random data improves sample efficiency in several environments.
arXiv Detail & Related papers (2024-03-25T02:17:54Z) - Towards Motion Forecasting with Real-World Perception Inputs: Are
End-to-End Approaches Competitive? [93.10694819127608]
We propose a unified evaluation pipeline for forecasting methods with real-world perception inputs.
Our in-depth study uncovers a substantial performance gap when transitioning from curated to perception-based data.
arXiv Detail & Related papers (2023-06-15T17:03:14Z) - Conditional Feature Importance for Mixed Data [1.6114012813668934]
We develop a conditional predictive impact (CPI) framework with knockoff sampling.
We show that our proposed workflow controls type I error, achieves high power and is in line with results given by other conditional FI measures.
Our findings highlight the necessity of developing statistically adequate, specialized methods for mixed data.
arXiv Detail & Related papers (2022-10-06T16:52:38Z) - Accounting for the Sequential Nature of States to Learn Features for
Reinforcement Learning [2.0646127669654826]
We investigate the properties of data that cause popular representation learning approaches to fail.
In particular, we find that in environments where states do not significantly overlap, variational autoencoders (VAEs) fail to learn useful features.
We demonstrate this failure in a simple gridworld domain, and then provide a solution in the form of metric learning.
arXiv Detail & Related papers (2022-05-12T10:20:43Z) - Towards Robust Bisimulation Metric Learning [3.42658286826597]
Bisimulation metrics offer one solution to representation learning problem.
We generalize value function approximation bounds for on-policy bisimulation metrics to non-optimal policies.
We find that these issues stem from an underconstrained dynamics model and an unstable dependence of the embedding norm on the reward signal.
arXiv Detail & Related papers (2021-10-27T00:32:07Z) - Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with
Latent Confounders [62.54431888432302]
We study an OPE problem in an infinite-horizon, ergodic Markov decision process with unobserved confounders.
We show how, given only a latent variable model for states and actions, policy value can be identified from off-policy data.
arXiv Detail & Related papers (2020-07-27T22:19:01Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z) - Learning Invariant Representations for Reinforcement Learning without
Reconstruction [98.33235415273562]
We study how representation learning can accelerate reinforcement learning from rich observations, such as images, without relying either on domain knowledge or pixel-reconstruction.
Bisimulation metrics quantify behavioral similarity between states in continuous MDPs.
We demonstrate the effectiveness of our method at disregarding task-irrelevant information using modified visual MuJoCo tasks.
arXiv Detail & Related papers (2020-06-18T17:59:35Z) - Towards Certified Robustness of Distance Metric Learning [53.96113074344632]
We advocate imposing an adversarial margin in the input space so as to improve the generalization and robustness of metric learning algorithms.
We show that the enlarged margin is beneficial to the generalization ability by using the theoretical technique of algorithmic robustness.
arXiv Detail & Related papers (2020-06-10T16:51:53Z) - Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement
Learning [70.01650994156797]
Off- evaluation of sequential decision policies from observational data is necessary in batch reinforcement learning such as education healthcare.
We develop an approach that estimates the bounds of a given policy.
We prove convergence to the sharp bounds as we collect more confounded data.
arXiv Detail & Related papers (2020-02-11T16:18:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.