Representation Learning for Continuous Action Spaces is Beneficial for
Efficient Policy Learning
- URL: http://arxiv.org/abs/2211.13257v1
- Date: Wed, 23 Nov 2022 19:09:37 GMT
- Title: Representation Learning for Continuous Action Spaces is Beneficial for
Efficient Policy Learning
- Authors: Tingting Zhao, Ying Wang, Wei Sun, Yarui Chen, Gang Niub, Masashi
Sugiyama
- Abstract summary: Deep reinforcement learning (DRL) breaks through the bottlenecks of traditional reinforcement learning (RL)
In this paper, we propose an efficient policy learning method in latent state and action spaces.
The effectiveness of the proposed method is demonstrated by MountainCar,CarRacing and Cheetah experiments.
- Score: 64.14557731665577
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning (DRL) breaks through the bottlenecks of
traditional reinforcement learning (RL) with the help of the perception
capability of deep learning and has been widely applied in real-world
problems.While model-free RL, as a class of efficient DRL methods, performs the
learning of state representations simultaneously with policy learning in an
end-to-end manner when facing large-scale continuous state and action spaces.
However, training such a large policy model requires a large number of
trajectory samples and training time. On the other hand, the learned policy
often fails to generalize to large-scale action spaces, especially for the
continuous action spaces. To address this issue, in this paper we propose an
efficient policy learning method in latent state and action spaces. More
specifically, we extend the idea of state representations to action
representations for better policy generalization capability. Meanwhile, we
divide the whole learning task into learning with the large-scale
representation models in an unsupervised manner and learning with the
small-scale policy model in the RL manner.The small policy model facilitates
policy learning, while not sacrificing generalization and expressiveness via
the large representation model. Finally,the effectiveness of the proposed
method is demonstrated by MountainCar,CarRacing and Cheetah experiments.
Related papers
- Discovering Behavioral Modes in Deep Reinforcement Learning Policies
Using Trajectory Clustering in Latent Space [0.0]
We introduce a new approach for investigating the behavior modes of DRL policies.
Specifically, we use Pairwise Controlled Manifold Approximation Projection (PaCMAP) for dimensionality reduction and TRACLUS for trajectory clustering.
Our methodology helps identify diverse behavior patterns and suboptimal choices by the policy, thus allowing for targeted improvements.
arXiv Detail & Related papers (2024-02-20T11:50:50Z) - Improving Generalization in Reinforcement Learning Training Regimes for
Social Robot Navigation [5.475804640008192]
We propose a method to improve the generalization performance of RL social navigation methods using curriculum learning.
Our results show that the use of curriculum learning in training can be used to achieve better generalization performance than previous training methods.
arXiv Detail & Related papers (2023-08-29T00:00:18Z) - Diffusion Policies as an Expressive Policy Class for Offline
Reinforcement Learning [70.20191211010847]
Offline reinforcement learning (RL) aims to learn an optimal policy using a previously collected static dataset.
We introduce Diffusion Q-learning (Diffusion-QL) that utilizes a conditional diffusion model to represent the policy.
We show that our method can achieve state-of-the-art performance on the majority of the D4RL benchmark tasks.
arXiv Detail & Related papers (2022-08-12T09:54:11Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Exploratory State Representation Learning [63.942632088208505]
We propose a new approach called XSRL (eXploratory State Representation Learning) to solve the problems of exploration and SRL in parallel.
On one hand, it jointly learns compact state representations and a state transition estimator which is used to remove unexploitable information from the representations.
On the other hand, it continuously trains an inverse model, and adds to the prediction error of this model a $k$-step learning progress bonus to form the objective of a discovery policy.
arXiv Detail & Related papers (2021-09-28T10:11:07Z) - What Matters In On-Policy Reinforcement Learning? A Large-Scale
Empirical Study [50.79125250286453]
On-policy reinforcement learning (RL) has been successfully applied to many different continuous control tasks.
But state-of-the-art implementations take numerous low- and high-level design decisions that strongly affect the performance of the resulting agents.
These choices are usually not extensively discussed in the literature, leading to discrepancy between published descriptions of algorithms and their implementations.
We implement >50 such choices'' in a unified on-policy RL framework, allowing us to investigate their impact in a large-scale empirical study.
arXiv Detail & Related papers (2020-06-10T17:59:03Z) - Efficient Deep Reinforcement Learning via Adaptive Policy Transfer [50.51637231309424]
Policy Transfer Framework (PTF) is proposed to accelerate Reinforcement Learning (RL)
Our framework learns when and which source policy is the best to reuse for the target policy and when to terminate it.
Experimental results show it significantly accelerates the learning process and surpasses state-of-the-art policy transfer methods.
arXiv Detail & Related papers (2020-02-19T07:30:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.