Towards Robust Bisimulation Metric Learning
- URL: http://arxiv.org/abs/2110.14096v1
- Date: Wed, 27 Oct 2021 00:32:07 GMT
- Title: Towards Robust Bisimulation Metric Learning
- Authors: Mete Kemertas, Tristan Aumentado-Armstrong
- Abstract summary: Bisimulation metrics offer one solution to representation learning problem.
We generalize value function approximation bounds for on-policy bisimulation metrics to non-optimal policies.
We find that these issues stem from an underconstrained dynamics model and an unstable dependence of the embedding norm on the reward signal.
- Score: 3.42658286826597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learned representations in deep reinforcement learning (DRL) have to extract
task-relevant information from complex observations, balancing between
robustness to distraction and informativeness to the policy. Such stable and
rich representations, often learned via modern function approximation
techniques, can enable practical application of the policy improvement theorem,
even in high-dimensional continuous state-action spaces. Bisimulation metrics
offer one solution to this representation learning problem, by collapsing
functionally similar states together in representation space, which promotes
invariance to noise and distractors. In this work, we generalize value function
approximation bounds for on-policy bisimulation metrics to non-optimal policies
and approximate environment dynamics. Our theoretical results help us identify
embedding pathologies that may occur in practical use. In particular, we find
that these issues stem from an underconstrained dynamics model and an unstable
dependence of the embedding norm on the reward signal in environments with
sparse rewards. Further, we propose a set of practical remedies: (i) a norm
constraint on the representation space, and (ii) an extension of prior
approaches with intrinsic rewards and latent space regularization. Finally, we
provide evidence that the resulting method is not only more robust to sparse
reward functions, but also able to solve challenging continuous control tasks
with observational distractions, where prior methods fail.
Related papers
- Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data [17.991833729722288]
We propose a novel policy learning algorithm, PESsimistic CAusal Learning (PESCAL)
Our key observation is that, by incorporating auxiliary variables that mediate the effect of actions on system dynamics, it is sufficient to learn a lower bound of the mediator distribution function, instead of the Q-function.
We provide theoretical guarantees for the algorithms we propose, and demonstrate their efficacy through simulations, as well as real-world experiments utilizing offline datasets from a leading ride-hailing platform.
arXiv Detail & Related papers (2024-03-18T14:51:19Z) - Invariant Causal Imitation Learning for Generalizable Policies [87.51882102248395]
We propose Invariant Causal Learning (ICIL) to learn an imitation policy.
ICIL learns a representation of causal features that is disentangled from the specific representations of noise variables.
We show that ICIL is effective in learning imitation policies capable of generalizing to unseen environments.
arXiv Detail & Related papers (2023-11-02T16:52:36Z) - Distributionally Robust Model-based Reinforcement Learning with Large
State Spaces [55.14361269378122]
Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment.
We study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets.
We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics.
arXiv Detail & Related papers (2023-09-05T13:42:11Z) - Conditional Kernel Imitation Learning for Continuous State Environments [9.750698192309978]
We introduce a novel conditional kernel density estimation-based imitation learning framework.
We show consistently superior empirical performance over many state-of-the-art IL algorithms.
arXiv Detail & Related papers (2023-08-24T05:26:42Z) - Policy Gradient Methods in the Presence of Symmetries and State
Abstractions [46.66541516203923]
Reinforcement learning (RL) on high-dimensional and complex problems relies on abstraction for improved efficiency and generalization.
We study abstraction in the continuous-control setting, and extend the definition of Markov decision process (MDP) homomorphisms to the setting of continuous state and action spaces.
We propose a family of actor-critic algorithms that are able to learn the policy and the MDP homomorphism map simultaneously.
arXiv Detail & Related papers (2023-05-09T17:59:10Z) - Maximum Causal Entropy Inverse Constrained Reinforcement Learning [3.409089945290584]
We propose a novel method that utilizes the principle of maximum causal entropy to learn constraints and an optimal policy.
We evaluate the effectiveness of the learned policy by assessing the reward received and the number of constraint violations.
Our method has been shown to outperform state-of-the-art approaches across a variety of tasks and environments.
arXiv Detail & Related papers (2023-05-04T14:18:19Z) - Verified Probabilistic Policies for Deep Reinforcement Learning [6.85316573653194]
We tackle the problem of verifying probabilistic policies for deep reinforcement learning.
We propose an abstraction approach, based on interval Markov decision processes, that yields guarantees on a policy's execution.
We present techniques to build and solve these models using abstract interpretation, mixed-integer linear programming, entropy-based refinement and probabilistic model checking.
arXiv Detail & Related papers (2022-01-10T23:55:04Z) - Scalable Bayesian Inverse Reinforcement Learning [93.27920030279586]
We introduce Approximate Variational Reward Imitation Learning (AVRIL)
Our method addresses the ill-posed nature of the inverse reinforcement learning problem.
Applying our method to real medical data alongside classic control simulations, we demonstrate Bayesian reward inference in environments beyond the scope of current methods.
arXiv Detail & Related papers (2021-02-12T12:32:02Z) - Strictly Batch Imitation Learning by Energy-based Distribution Matching [104.33286163090179]
Consider learning a policy purely on the basis of demonstrated behavior -- that is, with no access to reinforcement signals, no knowledge of transition dynamics, and no further interaction with the environment.
One solution is simply to retrofit existing algorithms for apprenticeship learning to work in the offline setting.
But such an approach leans heavily on off-policy evaluation or offline model estimation, and can be indirect and inefficient.
We argue that a good solution should be able to explicitly parameterize a policy, implicitly learn from rollout dynamics, and operate in an entirely offline fashion.
arXiv Detail & Related papers (2020-06-25T03:27:59Z) - Invariant Causal Prediction for Block MDPs [106.63346115341862]
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges.
We propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting.
arXiv Detail & Related papers (2020-03-12T21:03:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.