Learning Interpretable Policies in Hindsight-Observable POMDPs through
Partially Supervised Reinforcement Learning
- URL: http://arxiv.org/abs/2402.09290v1
- Date: Wed, 14 Feb 2024 16:23:23 GMT
- Title: Learning Interpretable Policies in Hindsight-Observable POMDPs through
Partially Supervised Reinforcement Learning
- Authors: Michael Lanier, Ying Xu, Nathan Jacobs, Chongjie Zhang, Yevgeniy
Vorobeychik
- Abstract summary: We introduce the Partially Supervised Reinforcement Learning (PSRL) framework.
At the heart of PSRL is the fusion of both supervised and unsupervised learning.
We show that PSRL offers a potent balance, enhancing model interpretability while preserving, and often significantly outperforming, the performance benchmarks set by traditional methods.
- Score: 57.67629402360924
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep reinforcement learning has demonstrated remarkable achievements across
diverse domains such as video games, robotic control, autonomous driving, and
drug discovery. Common methodologies in partially-observable domains largely
lean on end-to-end learning from high-dimensional observations, such as images,
without explicitly reasoning about true state. We suggest an alternative
direction, introducing the Partially Supervised Reinforcement Learning (PSRL)
framework. At the heart of PSRL is the fusion of both supervised and
unsupervised learning. The approach leverages a state estimator to distill
supervised semantic state information from high-dimensional observations which
are often fully observable at training time. This yields more interpretable
policies that compose state predictions with control. In parallel, it captures
an unsupervised latent representation. These two-the semantic state and the
latent state-are then fused and utilized as inputs to a policy network. This
juxtaposition offers practitioners a flexible and dynamic spectrum: from
emphasizing supervised state information to integrating richer, latent
insights. Extensive experimental results indicate that by merging these dual
representations, PSRL offers a potent balance, enhancing model interpretability
while preserving, and often significantly outperforming, the performance
benchmarks set by traditional methods in terms of reward and convergence speed.
Related papers
- iQRL -- Implicitly Quantized Representations for Sample-efficient Reinforcement Learning [24.684363928059113]
We propose an efficient representation learning method using only a self-supervised latent-state consistency loss.
We achieve high performance and prevent representation collapse by quantizing the latent representation.
Our method, named iQRL: implicitly Quantized Reinforcement Learning, is straightforward, compatible with any model-free RL algorithm.
arXiv Detail & Related papers (2024-06-04T18:15:44Z) - Harnessing Discrete Representations For Continual Reinforcement Learning [8.61539229796467]
We investigate the advantages of representing observations as vectors of categorical values within the context of reinforcement learning.
We find that, when compared to traditional continuous representations, world models learned over discrete representations accurately model more of the world with less capacity.
arXiv Detail & Related papers (2023-12-02T18:55:26Z) - TACO: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning [73.53576440536682]
We introduce TACO: Temporal Action-driven Contrastive Learning, a powerful temporal contrastive learning approach.
TACO simultaneously learns a state and an action representation by optimizing the mutual information between representations of current states.
For online RL, TACO achieves 40% performance boost after one million environment interaction steps.
arXiv Detail & Related papers (2023-06-22T22:21:53Z) - Leveraging Fully Observable Policies for Learning under Partial
Observability [14.918197552051929]
We propose a method for partially observable reinforcement learning that uses a fully observable policy during offline training to improve online performance.
Our approach can leverage the fully-observable policy for exploration and parts of the domain that are fully observable while still being able to learn under partial observability.
A successful policy transfer to a physical robot in a manipulation task from pixels shows our approach's practicality in learning interesting policies under partial observability.
arXiv Detail & Related papers (2022-11-03T16:57:45Z) - Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making.
Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values.
It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z) - Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel.
On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations.
On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z) - Co$^2$L: Contrastive Continual Learning [69.46643497220586]
Recent breakthroughs in self-supervised learning show that such algorithms learn visual representations that can be transferred better to unseen tasks.
We propose a rehearsal-based continual learning algorithm that focuses on continually learning and maintaining transferable representations.
arXiv Detail & Related papers (2021-06-28T06:14:38Z) - Which Mutual-Information Representation Learning Objectives are
Sufficient for Control? [80.2534918595143]
Mutual information provides an appealing formalism for learning representations of data.
This paper formalizes the sufficiency of a state representation for learning and representing the optimal policy.
Surprisingly, we find that two of these objectives can yield insufficient representations given mild and common assumptions on the structure of the MDP.
arXiv Detail & Related papers (2021-06-14T10:12:34Z) - Towards Learning Controllable Representations of Physical Systems [9.088303226909279]
Learned representations of dynamical systems reduce dimensionality, potentially supporting downstream reinforcement learning (RL)
We consider the relationship between the true state and the corresponding representations, proposing that ideally each representation corresponds to a unique state.
These metrics are shown to predict reinforcement learning performance in a simulated peg-in-hole task when comparing variants of autoencoder-based representations.
arXiv Detail & Related papers (2020-11-16T17:15:57Z) - An Improved Semi-Supervised VAE for Learning Disentangled
Representations [29.38345769998613]
We introduce another source of supervision that we denote as label replacement.
During training, we replace the inferred representation associated with a data point with its ground-truth representation whenever it is available.
Our extension is theoretically inspired by our proposed general framework of semi-supervised disentanglement learning.
arXiv Detail & Related papers (2020-06-12T20:47:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.