Related papers: Jointly-Learned State-Action Embedding for Efficient Reinforcement Learning

Jointly-Learned State-Action Embedding for Efficient Reinforcement Learning

URL: http://arxiv.org/abs/2010.04444v4
Date: Fri, 20 Aug 2021 10:20:11 GMT
Title: Jointly-Learned State-Action Embedding for Efficient Reinforcement Learning
Authors: Paul J. Pritz and Liang Ma and Kin K. Leung
Abstract summary: We propose a new approach for learning embeddings for states and actions that combines aspects of model-free and model-based reinforcement learning. We show that our approach significantly outperforms state-of-the-art models in both discrete/continuous domains with large state/action spaces.
Score: 8.342863878589332
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While reinforcement learning has achieved considerable successes in recent years, state-of-the-art models are often still limited by the size of state and action spaces. Model-free reinforcement learning approaches use some form of state representations and the latest work has explored embedding techniques for actions, both with the aim of achieving better generalization and applicability. However, these approaches consider only states or actions, ignoring the interaction between them when generating embedded representations. In this work, we establish the theoretical foundations for the validity of training a reinforcement learning agent using embedded states and actions. We then propose a new approach for jointly learning embeddings for states and actions that combines aspects of model-free and model-based reinforcement learning, which can be applied in both discrete and continuous domains. Specifically, we use a model of the environment to obtain embeddings for states and actions and present a generic architecture that leverages these to learn a policy. In this way, the embedded representations obtained via our approach enable better generalization over both states and actions by capturing similarities in the embedding spaces. Evaluations of our approach on several gaming, robotic control, and recommender systems show it significantly outperforms state-of-the-art models in both discrete/continuous domains with large state/action spaces, thus confirming its efficacy.

Related papers

Policy Regularization on Globally Accessible States in Cross-Dynamics Reinforcement Learning [53.9544543607396]
We propose a novel framework that integrates reward rendering with Imitation from Observation (IfO) By instantiating F-distance in different ways, we derive two theoretical analysis and develop a practical algorithm called Accessible State Oriented Policy Regularization (ASOR) ASOR serves as a general add-on module that can be incorporated into various approaches RL, including offline RL and off-policy RL.
arXiv Detail & Related papers (2025-03-10T03:50:20Z)
Towards Modality Generalization: A Benchmark and Prospective Analysis [56.84045461854789]
This paper introduces Modality Generalization (MG), which focuses on enabling models to generalize to unseen modalities. We propose a comprehensive benchmark featuring multi-modal algorithms and adapt existing methods that focus on generalization. Our work provides a foundation for advancing robust and adaptable multi-modal models, enabling them to handle unseen modalities in realistic scenarios.
arXiv Detail & Related papers (2024-12-24T08:38:35Z)
Learning Interpretable Policies in Hindsight-Observable POMDPs through Partially Supervised Reinforcement Learning [57.67629402360924]
We introduce the Partially Supervised Reinforcement Learning (PSRL) framework. At the heart of PSRL is the fusion of both supervised and unsupervised learning. We show that PSRL offers a potent balance, enhancing model interpretability while preserving, and often significantly outperforming, the performance benchmarks set by traditional methods.
arXiv Detail & Related papers (2024-02-14T16:23:23Z)
Representation Learning for Continuous Action Spaces is Beneficial for Efficient Policy Learning [64.14557731665577]
Deep reinforcement learning (DRL) breaks through the bottlenecks of traditional reinforcement learning (RL) In this paper, we propose an efficient policy learning method in latent state and action spaces. The effectiveness of the proposed method is demonstrated by MountainCar,CarRacing and Cheetah experiments.
arXiv Detail & Related papers (2022-11-23T19:09:37Z)
Leveraging Demonstrations with Latent Space Priors [90.56502305574665]
We propose to leverage demonstration datasets by combining skill learning and sequence modeling. We show how to acquire such priors from state-only motion capture demonstrations and explore several methods for integrating them into policy learning. Our experimental results confirm that latent space priors provide significant gains in learning speed and final performance in a set of challenging sparse-reward environments.
arXiv Detail & Related papers (2022-10-26T13:08:46Z)
Causal Dynamics Learning for Task-Independent State Abstraction [61.707048209272884]
We introduce Causal Dynamics Learning for Task-Independent State Abstraction (CDL) CDL learns a theoretically proved causal dynamics model that removes unnecessary dependencies between state variables and the action. A state abstraction can then be derived from the learned dynamics.
arXiv Detail & Related papers (2022-06-27T17:02:53Z)
State Representation Learning for Goal-Conditioned Reinforcement Learning [9.162936410696407]
This paper presents a novel state representation for reward-free Markov decision processes. The idea is to learn, in a self-supervised manner, an embedding space where between pairs of embedded states correspond to the minimum number of actions needed to transition between them. We show how this representation can be leveraged to learn goal-conditioned policies.
arXiv Detail & Related papers (2022-05-04T09:20:09Z)
Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning [120.38381203153159]
Reinforcement learning can train policies that effectively perform complex tasks. For long-horizon tasks, the performance of these methods degrades with horizon, often necessitating reasoning over and composing lower-level skills. We propose Value Function Spaces: a simple approach that produces such a representation by using the value functions corresponding to each lower-level skill.
arXiv Detail & Related papers (2021-11-04T22:46:16Z)
Learning Markov State Abstractions for Deep Reinforcement Learning [17.34529517221924]
We introduce a novel set of conditions and prove that they are sufficient for learning a Markov abstract state representation. We then describe a practical training procedure that combines inverse model estimation and temporal contrastive learning. Our approach learns representations that capture the underlying structure of the domain and lead to improved sample efficiency.
arXiv Detail & Related papers (2021-06-08T14:12:36Z)
Metrics and continuity in reinforcement learning [34.10996560464196]
We introduce a unified formalism for defining topologies through the lens of metrics. We establish a hierarchy amongst these metrics and demonstrate their theoretical implications on the Markov Decision Process. We complement our theoretical results with empirical evaluations showcasing the differences between the metrics considered.
arXiv Detail & Related papers (2021-02-02T14:30:41Z)
Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations using Generative Models [18.195406135434503]
We propose a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential. We show that this accelerates policy learning by specifying high-value areas of the state and action space that are worth exploring first. In particular, we examine both normalizing flows and Generative Adversarial Networks to represent these potentials.
arXiv Detail & Related papers (2020-11-02T20:32:05Z)
State-Only Imitation Learning for Dexterous Manipulation [63.03621861920732]
In this paper, we explore state-only imitation learning. We train an inverse dynamics model and use it to predict actions for state-only demonstrations. Our method performs on par with state-action approaches and considerably outperforms RL alone.
arXiv Detail & Related papers (2020-04-07T17:57:20Z)
Universal Value Density Estimation for Imitation Learning and Goal-Conditioned Reinforcement Learning [5.406386303264086]
In either case, effective solutions require the agent to reliably reach a specified state. This work introduces an approach which utilizes recent advances in density estimation to effectively learn to reach a given state. As our first contribution, we use this approach for goal-conditioned reinforcement learning and show that it is both efficient and does not suffer from hindsight bias in domains. As our second contribution, we extend the approach to imitation learning and show that it achieves state-of-the art demonstration sample-efficiency on standard benchmark tasks.
arXiv Detail & Related papers (2020-02-15T23:46:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.