Wasserstein Diversity-Enriched Regularizer for Hierarchical
Reinforcement Learning
- URL: http://arxiv.org/abs/2308.00989v1
- Date: Wed, 2 Aug 2023 07:45:24 GMT
- Title: Wasserstein Diversity-Enriched Regularizer for Hierarchical
Reinforcement Learning
- Authors: Haorui Li, Jiaqi Liang, Linjing Li, and Daniel Zeng
- Abstract summary: We propose a novel task-agnostic regularizer called the Wasserstein Diversity-Enriched Regularizer (WDER)
The proposed WDER can be easily incorporated into the loss function of existing methods to boost their performance further.
- Score: 2.4236602474594635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hierarchical reinforcement learning composites subpolicies in different
hierarchies to accomplish complex tasks.Automated subpolicies discovery, which
does not depend on domain knowledge, is a promising approach to generating
subpolicies.However, the degradation problem is a challenge that existing
methods can hardly deal with due to the lack of consideration of diversity or
the employment of weak regularizers. In this paper, we propose a novel
task-agnostic regularizer called the Wasserstein Diversity-Enriched Regularizer
(WDER), which enlarges the diversity of subpolicies by maximizing the
Wasserstein distances among action distributions. The proposed WDER can be
easily incorporated into the loss function of existing methods to boost their
performance further.Experimental results demonstrate that our WDER improves
performance and sample efficiency in comparison with prior work without
modifying hyperparameters, which indicates the applicability and robustness of
the WDER.
Related papers
- Hierarchical Orchestra of Policies [1.6574413179773757]
HOP dynamically forms a hierarchy of policies based on a similarity metric between the current observations and previously encountered observations in successful tasks.
HOP does not require task labelling, allowing for robust adaptation in environments where boundaries between tasks are ambiguous.
Our experiments, conducted across multiple tasks in a procedurally generated suite of environments, demonstrate that HOP significantly outperforms baseline methods in retaining knowledge across tasks.
arXiv Detail & Related papers (2024-11-05T11:13:09Z) - Accelerating Task Generalisation with Multi-Level Hierarchical Options [1.6574413179773757]
Fracture Cluster Options (FraCOs) is a hierarchical reinforcement learning method that achieves state-of-the-art performance on difficult generalisation tasks.
We evaluate FraCOs against state-of-the-art deep reinforcement learning algorithms in several complex procedurally generated environments.
arXiv Detail & Related papers (2024-11-05T11:00:09Z) - HarmoDT: Harmony Multi-Task Decision Transformer for Offline Reinforcement Learning [72.25707314772254]
We introduce the Harmony Multi-Task Decision Transformer (HarmoDT), a novel solution designed to identify an optimal harmony subspace of parameters for each task.
The upper level of this framework is dedicated to learning a task-specific mask that delineates the harmony subspace, while the inner level focuses on updating parameters to enhance the overall performance of the unified policy.
arXiv Detail & Related papers (2024-05-28T11:41:41Z) - Promoting Generalization for Exact Solvers via Adversarial Instance
Augmentation [62.738582127114704]
Adar is a framework for understanding and improving the generalization of both imitation-learning-based (IL-based) and reinforcement-learning-based solvers (RL-based)
arXiv Detail & Related papers (2023-10-22T03:15:36Z) - Semantically Aligned Task Decomposition in Multi-Agent Reinforcement
Learning [56.26889258704261]
We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA)
SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning.
SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
arXiv Detail & Related papers (2023-05-18T10:37:54Z) - Source-free Domain Adaptation Requires Penalized Diversity [60.04618512479438]
Source-free domain adaptation (SFDA) was introduced to address knowledge transfer between different domains in the absence of source data.
In unsupervised SFDA, the diversity is limited to learning a single hypothesis on the source or learning multiple hypotheses with a shared feature extractor.
We propose a novel unsupervised SFDA algorithm that promotes representational diversity through the use of separate feature extractors.
arXiv Detail & Related papers (2023-04-06T00:20:19Z) - Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms.
We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework.
Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z) - Wasserstein Distance guided Adversarial Imitation Learning with Reward
Shape Exploration [21.870750931559915]
We propose a new algorithm named Wasserstein Distance guided Adrial Imitation Learning (WDAIL) for promoting the performance of imitation learning (IL)
The experiment results show that the learning procedure remains remarkably stable, and achieves significant performance in the complex continuous control tasks of MuJoCo.
arXiv Detail & Related papers (2020-06-05T15:10:00Z) - Variational Policy Propagation for Multi-agent Reinforcement Learning [68.26579560607597]
We propose a emphcollaborative multi-agent reinforcement learning algorithm named variational policy propagation (VPP) to learn a emphjoint policy through the interactions over agents.
We prove that the joint policy is a Markov Random Field under some mild conditions, which in turn reduces the policy space effectively.
We integrate the variational inference as special differentiable layers in policy such as the actions can be efficiently sampled from the Markov Random Field and the overall policy is differentiable.
arXiv Detail & Related papers (2020-04-19T15:42:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.