Jointly-Learned State-Action Embedding for Efficient Reinforcement
Learning
- URL: http://arxiv.org/abs/2010.04444v4
- Date: Fri, 20 Aug 2021 10:20:11 GMT
- Title: Jointly-Learned State-Action Embedding for Efficient Reinforcement
Learning
- Authors: Paul J. Pritz and Liang Ma and Kin K. Leung
- Abstract summary: We propose a new approach for learning embeddings for states and actions that combines aspects of model-free and model-based reinforcement learning.
We show that our approach significantly outperforms state-of-the-art models in both discrete/continuous domains with large state/action spaces.
- Score: 8.342863878589332
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While reinforcement learning has achieved considerable successes in recent
years, state-of-the-art models are often still limited by the size of state and
action spaces. Model-free reinforcement learning approaches use some form of
state representations and the latest work has explored embedding techniques for
actions, both with the aim of achieving better generalization and
applicability. However, these approaches consider only states or actions,
ignoring the interaction between them when generating embedded representations.
In this work, we establish the theoretical foundations for the validity of
training a reinforcement learning agent using embedded states and actions. We
then propose a new approach for jointly learning embeddings for states and
actions that combines aspects of model-free and model-based reinforcement
learning, which can be applied in both discrete and continuous domains.
Specifically, we use a model of the environment to obtain embeddings for states
and actions and present a generic architecture that leverages these to learn a
policy. In this way, the embedded representations obtained via our approach
enable better generalization over both states and actions by capturing
similarities in the embedding spaces. Evaluations of our approach on several
gaming, robotic control, and recommender systems show it significantly
outperforms state-of-the-art models in both discrete/continuous domains with
large state/action spaces, thus confirming its efficacy.
Related papers
- ACT-JEPA: Joint-Embedding Predictive Architecture Improves Policy Representation Learning [90.41852663775086]
ACT-JEPA is a novel architecture that integrates imitation learning and self-supervised learning.
We train a policy to predict action sequences and abstract observation sequences.
Our experiments show that ACT-JEPA improves the quality of representations by learning temporal environment dynamics.
arXiv Detail & Related papers (2025-01-24T16:41:41Z) - Towards Modality Generalization: A Benchmark and Prospective Analysis [56.84045461854789]
This paper introduces Modality Generalization (MG), which focuses on enabling models to generalize to unseen modalities.
We propose a comprehensive benchmark featuring multi-modal algorithms and adapt existing methods that focus on generalization.
Our work provides a foundation for advancing robust and adaptable multi-modal models, enabling them to handle unseen modalities in realistic scenarios.
arXiv Detail & Related papers (2024-12-24T08:38:35Z) - Learning Interpretable Policies in Hindsight-Observable POMDPs through
Partially Supervised Reinforcement Learning [57.67629402360924]
We introduce the Partially Supervised Reinforcement Learning (PSRL) framework.
At the heart of PSRL is the fusion of both supervised and unsupervised learning.
We show that PSRL offers a potent balance, enhancing model interpretability while preserving, and often significantly outperforming, the performance benchmarks set by traditional methods.
arXiv Detail & Related papers (2024-02-14T16:23:23Z) - Representation Learning for Continuous Action Spaces is Beneficial for
Efficient Policy Learning [64.14557731665577]
Deep reinforcement learning (DRL) breaks through the bottlenecks of traditional reinforcement learning (RL)
In this paper, we propose an efficient policy learning method in latent state and action spaces.
The effectiveness of the proposed method is demonstrated by MountainCar,CarRacing and Cheetah experiments.
arXiv Detail & Related papers (2022-11-23T19:09:37Z) - Causal Dynamics Learning for Task-Independent State Abstraction [61.707048209272884]
We introduce Causal Dynamics Learning for Task-Independent State Abstraction (CDL)
CDL learns a theoretically proved causal dynamics model that removes unnecessary dependencies between state variables and the action.
A state abstraction can then be derived from the learned dynamics.
arXiv Detail & Related papers (2022-06-27T17:02:53Z) - State Representation Learning for Goal-Conditioned Reinforcement
Learning [9.162936410696407]
This paper presents a novel state representation for reward-free Markov decision processes.
The idea is to learn, in a self-supervised manner, an embedding space where between pairs of embedded states correspond to the minimum number of actions needed to transition between them.
We show how this representation can be leveraged to learn goal-conditioned policies.
arXiv Detail & Related papers (2022-05-04T09:20:09Z) - Learning Markov State Abstractions for Deep Reinforcement Learning [17.34529517221924]
We introduce a novel set of conditions and prove that they are sufficient for learning a Markov abstract state representation.
We then describe a practical training procedure that combines inverse model estimation and temporal contrastive learning.
Our approach learns representations that capture the underlying structure of the domain and lead to improved sample efficiency.
arXiv Detail & Related papers (2021-06-08T14:12:36Z) - Metrics and continuity in reinforcement learning [34.10996560464196]
We introduce a unified formalism for defining topologies through the lens of metrics.
We establish a hierarchy amongst these metrics and demonstrate their theoretical implications on the Markov Decision Process.
We complement our theoretical results with empirical evaluations showcasing the differences between the metrics considered.
arXiv Detail & Related papers (2021-02-02T14:30:41Z) - Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations
using Generative Models [18.195406135434503]
We propose a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential.
We show that this accelerates policy learning by specifying high-value areas of the state and action space that are worth exploring first.
In particular, we examine both normalizing flows and Generative Adversarial Networks to represent these potentials.
arXiv Detail & Related papers (2020-11-02T20:32:05Z) - State-Only Imitation Learning for Dexterous Manipulation [63.03621861920732]
In this paper, we explore state-only imitation learning.
We train an inverse dynamics model and use it to predict actions for state-only demonstrations.
Our method performs on par with state-action approaches and considerably outperforms RL alone.
arXiv Detail & Related papers (2020-04-07T17:57:20Z) - Universal Value Density Estimation for Imitation Learning and
Goal-Conditioned Reinforcement Learning [5.406386303264086]
In either case, effective solutions require the agent to reliably reach a specified state.
This work introduces an approach which utilizes recent advances in density estimation to effectively learn to reach a given state.
As our first contribution, we use this approach for goal-conditioned reinforcement learning and show that it is both efficient and does not suffer from hindsight bias in domains.
As our second contribution, we extend the approach to imitation learning and show that it achieves state-of-the art demonstration sample-efficiency on standard benchmark tasks.
arXiv Detail & Related papers (2020-02-15T23:46:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.