Jointly-Learned State-Action Embedding for Efficient Reinforcement
Learning
- URL: http://arxiv.org/abs/2010.04444v4
- Date: Fri, 20 Aug 2021 10:20:11 GMT
- Title: Jointly-Learned State-Action Embedding for Efficient Reinforcement
Learning
- Authors: Paul J. Pritz and Liang Ma and Kin K. Leung
- Abstract summary: We propose a new approach for learning embeddings for states and actions that combines aspects of model-free and model-based reinforcement learning.
We show that our approach significantly outperforms state-of-the-art models in both discrete/continuous domains with large state/action spaces.
- Score: 8.342863878589332
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While reinforcement learning has achieved considerable successes in recent
years, state-of-the-art models are often still limited by the size of state and
action spaces. Model-free reinforcement learning approaches use some form of
state representations and the latest work has explored embedding techniques for
actions, both with the aim of achieving better generalization and
applicability. However, these approaches consider only states or actions,
ignoring the interaction between them when generating embedded representations.
In this work, we establish the theoretical foundations for the validity of
training a reinforcement learning agent using embedded states and actions. We
then propose a new approach for jointly learning embeddings for states and
actions that combines aspects of model-free and model-based reinforcement
learning, which can be applied in both discrete and continuous domains.
Specifically, we use a model of the environment to obtain embeddings for states
and actions and present a generic architecture that leverages these to learn a
policy. In this way, the embedded representations obtained via our approach
enable better generalization over both states and actions by capturing
similarities in the embedding spaces. Evaluations of our approach on several
gaming, robotic control, and recommender systems show it significantly
outperforms state-of-the-art models in both discrete/continuous domains with
large state/action spaces, thus confirming its efficacy.
Related papers
- Learning Interpretable Policies in Hindsight-Observable POMDPs through
Partially Supervised Reinforcement Learning [57.67629402360924]
We introduce the Partially Supervised Reinforcement Learning (PSRL) framework.
At the heart of PSRL is the fusion of both supervised and unsupervised learning.
We show that PSRL offers a potent balance, enhancing model interpretability while preserving, and often significantly outperforming, the performance benchmarks set by traditional methods.
arXiv Detail & Related papers (2024-02-14T16:23:23Z) - Representation Learning for Continuous Action Spaces is Beneficial for
Efficient Policy Learning [64.14557731665577]
Deep reinforcement learning (DRL) breaks through the bottlenecks of traditional reinforcement learning (RL)
In this paper, we propose an efficient policy learning method in latent state and action spaces.
The effectiveness of the proposed method is demonstrated by MountainCar,CarRacing and Cheetah experiments.
arXiv Detail & Related papers (2022-11-23T19:09:37Z) - Leveraging Demonstrations with Latent Space Priors [90.56502305574665]
We propose to leverage demonstration datasets by combining skill learning and sequence modeling.
We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning.
Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance in a set of challenging sparse-reward environments.
arXiv Detail & Related papers (2022-10-26T13:08:46Z) - Causal Dynamics Learning for Task-Independent State Abstraction [61.707048209272884]
We introduce Causal Dynamics Learning for Task-Independent State Abstraction (CDL)
CDL learns a theoretically proved causal dynamics model that removes unnecessary dependencies between state variables and the action.
A state abstraction can then be derived from the learned dynamics.
arXiv Detail & Related papers (2022-06-27T17:02:53Z) - State Representation Learning for Goal-Conditioned Reinforcement
Learning [9.162936410696407]
This paper presents a novel state representation for reward-free Markov decision processes.
The idea is to learn, in a self-supervised manner, an embedding space where between pairs of embedded states correspond to the minimum number of actions needed to transition between them.
We show how this representation can be leveraged to learn goal-conditioned policies.
arXiv Detail & Related papers (2022-05-04T09:20:09Z) - Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon
Reasoning [120.38381203153159]
Reinforcement learning can train policies that effectively perform complex tasks.
For long-horizon tasks, the performance of these methods degrades with horizon, often necessitating reasoning over and composing lower-level skills.
We propose Value Function Spaces: a simple approach that produces such a representation by using the value functions corresponding to each lower-level skill.
arXiv Detail & Related papers (2021-11-04T22:46:16Z) - Learning Markov State Abstractions for Deep Reinforcement Learning [17.34529517221924]
We introduce a novel set of conditions and prove that they are sufficient for learning a Markov abstract state representation.
We then describe a practical training procedure that combines inverse model estimation and temporal contrastive learning.
Our approach learns representations that capture the underlying structure of the domain and lead to improved sample efficiency.
arXiv Detail & Related papers (2021-06-08T14:12:36Z) - Metrics and continuity in reinforcement learning [34.10996560464196]
We introduce a unified formalism for defining topologies through the lens of metrics.
We establish a hierarchy amongst these metrics and demonstrate their theoretical implications on the Markov Decision Process.
We complement our theoretical results with empirical evaluations showcasing the differences between the metrics considered.
arXiv Detail & Related papers (2021-02-02T14:30:41Z) - Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations
using Generative Models [18.195406135434503]
We propose a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential.
We show that this accelerates policy learning by specifying high-value areas of the state and action space that are worth exploring first.
In particular, we examine both normalizing flows and Generative Adversarial Networks to represent these potentials.
arXiv Detail & Related papers (2020-11-02T20:32:05Z) - State-Only Imitation Learning for Dexterous Manipulation [63.03621861920732]
In this paper, we explore state-only imitation learning.
We train an inverse dynamics model and use it to predict actions for state-only demonstrations.
Our method performs on par with state-action approaches and considerably outperforms RL alone.
arXiv Detail & Related papers (2020-04-07T17:57:20Z) - Universal Value Density Estimation for Imitation Learning and
Goal-Conditioned Reinforcement Learning [5.406386303264086]
In either case, effective solutions require the agent to reliably reach a specified state.
This work introduces an approach which utilizes recent advances in density estimation to effectively learn to reach a given state.
As our first contribution, we use this approach for goal-conditioned reinforcement learning and show that it is both efficient and does not suffer from hindsight bias in domains.
As our second contribution, we extend the approach to imitation learning and show that it achieves state-of-the art demonstration sample-efficiency on standard benchmark tasks.
arXiv Detail & Related papers (2020-02-15T23:46:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.