Wasserstein Unsupervised Reinforcement Learning
- URL: http://arxiv.org/abs/2110.07940v1
- Date: Fri, 15 Oct 2021 08:41:51 GMT
- Title: Wasserstein Unsupervised Reinforcement Learning
- Authors: Shuncheng He, Yuhang Jiang, Hongchang Zhang, Jianzhun Shao, Xiangyang
Ji
- Abstract summary: Unsupervised reinforcement learning aims to train agents to learn a handful of policies or skills in environments without external reward.
These pre-trained policies can accelerate latent learning when endowed with external reward, and can also be used as primitive options in hierarchical reinforcement learning.
We propose a new framework Wasserstein unsupervised reinforcement learning (WURL) where we directly maximize the distance of state distributions induced by different policies.
- Score: 29.895142928565228
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Unsupervised reinforcement learning aims to train agents to learn a handful
of policies or skills in environments without external reward. These
pre-trained policies can accelerate learning when endowed with external reward,
and can also be used as primitive options in hierarchical reinforcement
learning. Conventional approaches of unsupervised skill discovery feed a latent
variable to the agent and shed its empowerment on agent's behavior by mutual
information (MI) maximization. However, the policies learned by MI-based
methods cannot sufficiently explore the state space, despite they can be
successfully identified from each other. Therefore we propose a new framework
Wasserstein unsupervised reinforcement learning (WURL) where we directly
maximize the distance of state distributions induced by different policies.
Additionally, we overcome difficulties in simultaneously training N(N >2)
policies, and amortizing the overall reward to each step. Experiments show
policies learned by our approach outperform MI-based methods on the metric of
Wasserstein distance while keeping high discriminability. Furthermore, the
agents trained by WURL can sufficiently explore the state space in mazes and
MuJoCo tasks and the pre-trained policies can be applied to downstream tasks by
hierarchical learning.
Related papers
- MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback
and Dynamic Distance Constraint [40.3872201560003]
Hierarchical reinforcement learning (HRL) uses a hierarchical framework that divides tasks into subgoals and completes them sequentially.
Current methods struggle to find suitable subgoals for ensuring a stable learning process.
We propose a general hierarchical reinforcement learning framework incorporating human feedback and dynamic distance constraints.
arXiv Detail & Related papers (2024-02-22T03:11:09Z) - Safety Correction from Baseline: Towards the Risk-aware Policy in
Robotics via Dual-agent Reinforcement Learning [64.11013095004786]
We propose a dual-agent safe reinforcement learning strategy consisting of a baseline and a safe agent.
Such a decoupled framework enables high flexibility, data efficiency and risk-awareness for RL-based control.
The proposed method outperforms the state-of-the-art safe RL algorithms on difficult robot locomotion and manipulation tasks.
arXiv Detail & Related papers (2022-12-14T03:11:25Z) - Flexible Attention-Based Multi-Policy Fusion for Efficient Deep
Reinforcement Learning [78.31888150539258]
Reinforcement learning (RL) agents have long sought to approach the efficiency of human learning.
Prior studies in RL have incorporated external knowledge policies to help agents improve sample efficiency.
We present Knowledge-Grounded RL (KGRL), an RL paradigm fusing multiple knowledge policies and aiming for human-like efficiency and flexibility.
arXiv Detail & Related papers (2022-10-07T17:56:57Z) - A State-Distribution Matching Approach to Non-Episodic Reinforcement
Learning [61.406020873047794]
A major hurdle to real-world application arises from the development of algorithms in an episodic setting.
We propose a new method, MEDAL, that trains the backward policy to match the state distribution in the provided demonstrations.
Our experiments show that MEDAL matches or outperforms prior methods on three sparse-reward continuous control tasks.
arXiv Detail & Related papers (2022-05-11T00:06:29Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - Goal-Conditioned Reinforcement Learning with Imagined Subgoals [89.67840168694259]
We propose to incorporate imagined subgoals into policy learning to facilitate learning of complex tasks.
Imagined subgoals are predicted by a separate high-level policy, which is trained simultaneously with the policy and its critic.
We evaluate our approach on complex robotic navigation and manipulation tasks and show that it outperforms existing methods by a large margin.
arXiv Detail & Related papers (2021-07-01T15:30:59Z) - Coverage as a Principle for Discovering Transferable Behavior in
Reinforcement Learning [16.12658895065585]
We argue that representation alone is not enough for efficient transfer in challenging domains and explore how to transfer knowledge through behavior.
The behavior of pre-trained policies may be used for solving the task at hand (exploitation) or for collecting useful data to solve the problem (exploration)
arXiv Detail & Related papers (2021-02-24T16:51:02Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.