Contrastive Behavioral Similarity Embeddings for Generalization in
Reinforcement Learning
- URL: http://arxiv.org/abs/2101.05265v2
- Date: Thu, 18 Mar 2021 13:58:01 GMT
- Title: Contrastive Behavioral Similarity Embeddings for Generalization in
Reinforcement Learning
- Authors: Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G.
Bellemare
- Abstract summary: We introduce a theoretically motivated policy similarity metric (PSM) for measuring behavioral similarity between states.
PSM assigns high similarity to states for which the optimal policies in those states as well as in future states are similar.
We present a contrastive representation learning procedure to embed any state similarity metric, which we instantiate with PSM to obtain policy similarity embeddings.
- Score: 41.85795493411269
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning methods trained on few environments rarely learn
policies that generalize to unseen environments. To improve generalization, we
incorporate the inherent sequential structure in reinforcement learning into
the representation learning process. This approach is orthogonal to recent
approaches, which rarely exploit this structure explicitly. Specifically, we
introduce a theoretically motivated policy similarity metric (PSM) for
measuring behavioral similarity between states. PSM assigns high similarity to
states for which the optimal policies in those states as well as in future
states are similar. We also present a contrastive representation learning
procedure to embed any state similarity metric, which we instantiate with PSM
to obtain policy similarity embeddings (PSEs). We demonstrate that PSEs improve
generalization on diverse benchmarks, including LQR with spurious correlations,
a jumping task from pixels, and Distracting DM Control Suite.
Related papers
- Learning in complex action spaces without policy gradients [8.81420331399616]
We show that QMLE can be applied to complex action spaces with a controllable computational cost that is comparable to that of policy gradient methods.
QMLE demonstrates strong performance on the DeepMind Control Suite, even when compared to the state-of-the-art methods.
arXiv Detail & Related papers (2024-10-08T19:49:34Z) - Learning Optimal Deterministic Policies with Stochastic Policy Gradients [62.81324245896716]
Policy gradient (PG) methods are successful approaches to deal with continuous reinforcement learning (RL) problems.
In common practice, convergence (hyper)policies are learned only to deploy their deterministic version.
We show how to tune the exploration level used for learning to optimize the trade-off between the sample complexity and the performance of the deployed deterministic policy.
arXiv Detail & Related papers (2024-05-03T16:45:15Z) - AgentMixer: Multi-Agent Correlated Policy Factorization [39.041191852287525]
We introduce textitstrategy modification to provide a mechanism for agents to correlate their policies.
We present a novel framework, AgentMixer, which constructs the joint fully observable policy as a non-linear combination of individual partially observable policies.
We show that AgentMixer converges to an $epsilon$-approximate Correlated Equilibrium.
arXiv Detail & Related papers (2024-01-16T15:32:41Z) - Invariant Causal Imitation Learning for Generalizable Policies [87.51882102248395]
We propose Invariant Causal Learning (ICIL) to learn an imitation policy.
ICIL learns a representation of causal features that is disentangled from the specific representations of noise variables.
We show that ICIL is effective in learning imitation policies capable of generalizing to unseen environments.
arXiv Detail & Related papers (2023-11-02T16:52:36Z) - Multi-Similarity Contrastive Learning [4.297070083645049]
We propose a novel multi-similarity contrastive loss (MSCon) that learns generalizable embeddings by jointly utilizing supervision from multiple metrics of similarity.
Our method automatically learns contrastive similarity weightings based on the uncertainty in the corresponding similarity.
We show empirically that networks trained with MSCon outperform state-of-the-art baselines on in-domain and out-of-domain settings.
arXiv Detail & Related papers (2023-07-06T01:26:01Z) - Policy Dispersion in Non-Markovian Environment [53.05904889617441]
This paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment.
We first adopt a transformer-based method to learn policy embeddings.
Then, we stack the policy embeddings to construct a dispersion matrix to induce a set of diverse policies.
arXiv Detail & Related papers (2023-02-28T11:58:39Z) - Learning Generalizable Representations for Reinforcement Learning via
Adaptive Meta-learner of Behavioral Similarities [43.327357653393015]
We propose a novel meta-learner-based framework for representation learning regarding behavioral similarities for reinforcement learning.
We empirically demonstrate that our proposed framework outperforms state-of-the-art baselines on several benchmarks.
arXiv Detail & Related papers (2022-12-26T11:11:23Z) - Continuous MDP Homomorphisms and Homomorphic Policy Gradient [51.25171126424949]
We extend the definition of MDP homomorphisms to encompass continuous actions in continuous state spaces.
We propose an actor-critic algorithm that is able to learn the policy and the MDP homomorphism map simultaneously.
arXiv Detail & Related papers (2022-09-15T15:26:49Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.