Learning Symbolic Representations for Reinforcement Learning of
Non-Markovian Behavior
- URL: http://arxiv.org/abs/2301.02952v1
- Date: Sun, 8 Jan 2023 00:47:19 GMT
- Title: Learning Symbolic Representations for Reinforcement Learning of
Non-Markovian Behavior
- Authors: Phillip J.K. Christoffersen, Andrew C. Li, Rodrigo Toro Icarte, Sheila
A. McIlraith
- Abstract summary: We show how to automatically discover useful state abstractions that support learning automata over the state-action history.
The result is an end-to-end algorithm that can learn optimal policies with significantly fewer environment samples than state-of-the-art RL.
- Score: 23.20013012953065
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many real-world reinforcement learning (RL) problems necessitate learning
complex, temporally extended behavior that may only receive reward signal when
the behavior is completed. If the reward-worthy behavior is known, it can be
specified in terms of a non-Markovian reward function - a function that depends
on aspects of the state-action history, rather than just the current state and
action. Such reward functions yield sparse rewards, necessitating an inordinate
number of experiences to find a policy that captures the reward-worthy pattern
of behavior. Recent work has leveraged Knowledge Representation (KR) to provide
a symbolic abstraction of aspects of the state that summarize reward-relevant
properties of the state-action history and support learning a Markovian
decomposition of the problem in terms of an automaton over the KR. Providing
such a decomposition has been shown to vastly improve learning rates,
especially when coupled with algorithms that exploit automaton structure.
Nevertheless, such techniques rely on a priori knowledge of the KR. In this
work, we explore how to automatically discover useful state abstractions that
support learning automata over the state-action history. The result is an
end-to-end algorithm that can learn optimal policies with significantly fewer
environment samples than state-of-the-art RL on simple non-Markovian domains.
Related papers
- Automated Feature Selection for Inverse Reinforcement Learning [7.278033100480175]
Inverse reinforcement learning (IRL) is an imitation learning approach to learning reward functions from expert demonstrations.
We propose a method that employs basis functions to form a candidate set of features.
We demonstrate the approach's effectiveness by recovering reward functions that capture expert policies.
arXiv Detail & Related papers (2024-03-22T10:05:21Z) - Contrastive Example-Based Control [163.6482792040079]
We propose a method for offline, example-based control that learns an implicit model of multi-step transitions, rather than a reward function.
Across a range of state-based and image-based offline control tasks, our method outperforms baselines that use learned reward functions.
arXiv Detail & Related papers (2023-07-24T19:43:22Z) - Noisy Symbolic Abstractions for Deep RL: A case study with Reward
Machines [23.15484341058261]
We investigate how to generate policies via RL when reward functions are specified in a symbolic language captured by Reward Machines.
We formulate the problem of policy learning in Reward Machines with noisy symbolic abstractions.
arXiv Detail & Related papers (2022-11-20T08:13:48Z) - Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making.
Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values.
It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z) - Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble [8.857776147129464]
Recovering reward function from expert demonstrations is a fundamental problem in reinforcement learning.
We present a dynamics-agnostic discriminator-ensemble reward learning method capable of learning both state-action and state-only reward functions.
arXiv Detail & Related papers (2022-06-01T05:16:39Z) - Imitating, Fast and Slow: Robust learning from demonstrations via
decision-time planning [96.72185761508668]
Planning at Test-time (IMPLANT) is a new meta-algorithm for imitation learning.
We demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments.
arXiv Detail & Related papers (2022-04-07T17:16:52Z) - Learning Long-Term Reward Redistribution via Randomized Return
Decomposition [18.47810850195995]
We consider the problem formulation of episodic reinforcement learning with trajectory feedback.
It refers to an extreme delay of reward signals, in which the agent can only obtain one reward signal at the end of each trajectory.
We propose a novel reward redistribution algorithm, randomized return decomposition (RRD), to learn a proxy reward function for episodic reinforcement learning.
arXiv Detail & Related papers (2021-11-26T13:23:36Z) - Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel.
On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations.
On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z) - MURAL: Meta-Learning Uncertainty-Aware Rewards for Outcome-Driven
Reinforcement Learning [65.52675802289775]
We show that an uncertainty aware classifier can solve challenging reinforcement learning problems.
We propose a novel method for computing the normalized maximum likelihood (NML) distribution.
We show that the resulting algorithm has a number of intriguing connections to both count-based exploration methods and prior algorithms for learning reward functions.
arXiv Detail & Related papers (2021-07-15T08:19:57Z) - Learning Markov State Abstractions for Deep Reinforcement Learning [17.34529517221924]
We introduce a novel set of conditions and prove that they are sufficient for learning a Markov abstract state representation.
We then describe a practical training procedure that combines inverse model estimation and temporal contrastive learning.
Our approach learns representations that capture the underlying structure of the domain and lead to improved sample efficiency.
arXiv Detail & Related papers (2021-06-08T14:12:36Z) - PsiPhi-Learning: Reinforcement Learning with Demonstrations using
Successor Features and Inverse Temporal Difference Learning [102.36450942613091]
We propose an inverse reinforcement learning algorithm, called emphinverse temporal difference learning (ITD)
We show how to seamlessly integrate ITD with learning from online environment interactions, arriving at a novel algorithm for reinforcement learning with demonstrations, called $Psi Phi$-learning.
arXiv Detail & Related papers (2021-02-24T21:12:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.