Plannable Approximations to MDP Homomorphisms: Equivariance under
Actions
- URL: http://arxiv.org/abs/2002.11963v1
- Date: Thu, 27 Feb 2020 08:29:10 GMT
- Title: Plannable Approximations to MDP Homomorphisms: Equivariance under
Actions
- Authors: Elise van der Pol, Thomas Kipf, Frans A. Oliehoek, Max Welling
- Abstract summary: We introduce a contrastive loss function that enforces action equivariance on the learned representations.
We prove that when our loss is zero, we have a homomorphism of a deterministic Markov Decision Process.
We show experimentally that for deterministic MDPs, the optimal policy in the abstract MDP can be successfully lifted to the original MDP.
- Score: 72.30921397899684
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work exploits action equivariance for representation learning in
reinforcement learning. Equivariance under actions states that transitions in
the input space are mirrored by equivalent transitions in latent space, while
the map and transition functions should also commute. We introduce a
contrastive loss function that enforces action equivariance on the learned
representations. We prove that when our loss is zero, we have a homomorphism of
a deterministic Markov Decision Process (MDP). Learning equivariant maps leads
to structured latent spaces, allowing us to build a model on which we plan
through value iteration. We show experimentally that for deterministic MDPs,
the optimal policy in the abstract MDP can be successfully lifted to the
original MDP. Moreover, the approach easily adapts to changes in the goal
states. Empirically, we show that in such MDPs, we obtain better
representations in fewer epochs compared to representation learning approaches
using reconstructions, while generalizing better to new goals than model-free
approaches.
Related papers
- PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE is a self-supervised learning framework that enhances global feature representation of point cloud mask autoencoders.
We show that PseudoNeg-MAE achieves state-of-the-art performance on the ModelNet40 and ScanObjectNN datasets.
arXiv Detail & Related papers (2024-09-24T07:57:21Z) - Flow Factorized Representation Learning [109.51947536586677]
We introduce a generative model which specifies a distinct set of latent probability paths that define different input transformations.
We show that our model achieves higher likelihoods on standard representation learning benchmarks while simultaneously being closer to approximately equivariant models.
arXiv Detail & Related papers (2023-09-22T20:15:37Z) - Continuous MDP Homomorphisms and Homomorphic Policy Gradient [51.25171126424949]
We extend the definition of MDP homomorphisms to encompass continuous actions in continuous state spaces.
We propose an actor-critic algorithm that is able to learn the policy and the MDP homomorphism map simultaneously.
arXiv Detail & Related papers (2022-09-15T15:26:49Z) - Using Forwards-Backwards Models to Approximate MDP Homomorphisms [11.020094184644789]
We propose a novel approach to constructing homomorphisms in discrete action spaces.
We use a learnt model of environment dynamics to infer which state-action pairs lead to the same state.
In MinAtar, we report an almost 4x improvement over a value-based off-policy baseline in the low sample limit.
arXiv Detail & Related papers (2022-09-14T00:38:12Z) - PAC Generalization via Invariant Representations [41.02828564338047]
We consider the notion of $epsilon$-approximate invariance in a finite sample setting.
Inspired by PAC learning, we obtain finite-sample out-of-distribution generalization guarantees.
Our results show bounds that do not scale in ambient dimension when intervention sites are restricted to lie in a constant size subset of in-degree bounded nodes.
arXiv Detail & Related papers (2022-05-30T15:50:14Z) - Meta Learning MDPs with Linear Transition Models [22.508479528847634]
We study meta-learning in Markov Decision Processes (MDP) with linear transition models in the undiscounted episodic setting.
We propose BUC-MatrixRL, a version of the UC-Matrix RL algorithm, and show it can meaningfully leverage a set of sampled training tasks.
We prove that compared to learning the tasks in isolation, BUC-Matrix RL provides significant improvements in the transfer regret for high bias low variance task distributions.
arXiv Detail & Related papers (2022-01-21T14:57:03Z) - Expert-Guided Symmetry Detection in Markov Decision Processes [0.0]
We propose a paradigm that aims to detect the presence of some transformations of the state-action space for which the MDP dynamics is invariant.
The results show that the model distributional shift is reduced when the dataset is augmented with the data obtained by using the detected symmetries.
arXiv Detail & Related papers (2021-11-19T16:12:30Z) - Self-Supervised Learning Disentangled Group Representation as Feature [82.07737719232972]
We show that existing Self-Supervised Learning (SSL) only disentangles simple augmentation features such as rotation and colorization.
We propose an iterative SSL algorithm: Iterative Partition-based Invariant Risk Minimization (IP-IRM)
We prove that IP-IRM converges to a fully disentangled representation and show its effectiveness on various benchmarks.
arXiv Detail & Related papers (2021-10-28T16:12:33Z) - A Reinforcement Learning Approach for Sequential Spatial Transformer
Networks [6.585049648605185]
We formulate the task as a Markovian Decision Process (MDP) and use RL to solve this sequential decision-making problem.
In our method, we are not bound to the differentiability of the sampling modules.
We design multiple experiments to verify the effectiveness of our method using cluttered MNIST and Fashion-MNIST datasets.
arXiv Detail & Related papers (2021-06-27T17:41:17Z) - Prototypical Contrastive Learning of Unsupervised Representations [171.3046900127166]
Prototypical Contrastive Learning (PCL) is an unsupervised representation learning method.
PCL implicitly encodes semantic structures of the data into the learned embedding space.
PCL outperforms state-of-the-art instance-wise contrastive learning methods on multiple benchmarks.
arXiv Detail & Related papers (2020-05-11T09:53:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.