A Unified Algorithm Framework for Unsupervised Discovery of Skills based
on Determinantal Point Process
- URL: http://arxiv.org/abs/2212.00211v3
- Date: Tue, 26 Sep 2023 14:44:41 GMT
- Title: A Unified Algorithm Framework for Unsupervised Discovery of Skills based
on Determinantal Point Process
- Authors: Jiayu Chen, Vaneet Aggarwal, Tian Lan
- Abstract summary: We show that diversity and coverage in unsupervised option discovery can indeed be unified under the same mathematical framework.
Our proposed algorithm, ODPP, has undergone extensive evaluation on challenging tasks created with Mujoco and Atari.
- Score: 53.86223883060367
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning rich skills under the option framework without supervision of
external rewards is at the frontier of reinforcement learning research.
Existing works mainly fall into two distinctive categories: variational option
discovery that maximizes the diversity of the options through a mutual
information loss (while ignoring coverage) and Laplacian-based methods that
focus on improving the coverage of options by increasing connectivity of the
state space (while ignoring diversity). In this paper, we show that diversity
and coverage in unsupervised option discovery can indeed be unified under the
same mathematical framework. To be specific, we explicitly quantify the
diversity and coverage of the learned options through a novel use of
Determinantal Point Process (DPP) and optimize these objectives to discover
options with both superior diversity and coverage. Our proposed algorithm,
ODPP, has undergone extensive evaluation on challenging tasks created with
Mujoco and Atari. The results demonstrate that our algorithm outperforms
state-of-the-art baselines in both diversity- and coverage-driven categories.
Related papers
- Phasic Diversity Optimization for Population-Based Reinforcement Learning [10.15130620537703]
Phasic Diversity Optimization (PDO) algorithm separates reward and diversity training into distinct phases.
In the auxiliary phase, agents with poor performance diversified via determinants will not replace the better agents in the archive.
We introduce two implementations of PDO archive and conduct tests in the newly proposed adversarial dogfight and MuJoCo simulations.
arXiv Detail & Related papers (2024-03-17T06:41:09Z) - Objectives Are All You Need: Solving Deceptive Problems Without Explicit
Diversity Maintenance [7.3153233408665495]
We present an approach with promise to solve deceptive domains without explicit diversity maintenance.
We use lexicase selection to optimize for these objectives as it has been shown to implicitly maintain population diversity.
We find that decomposing objectives into many objectives and optimizing them outperforms MAP-Elites on the deceptive domains that we explore.
arXiv Detail & Related papers (2023-11-04T00:09:48Z) - Iteratively Learn Diverse Strategies with State Distance Information [18.509323383456707]
In complex reinforcement learning problems, policies with similar rewards may have substantially different behaviors.
We develop a novel diversity-driven RL algorithm, State-based Intrinsic-reward Policy Optimization (SIPO), with provable convergence properties.
arXiv Detail & Related papers (2023-10-23T02:41:34Z) - Diversify Question Generation with Retrieval-Augmented Style Transfer [68.00794669873196]
We propose RAST, a framework for Retrieval-Augmented Style Transfer.
The objective is to utilize the style of diverse templates for question generation.
We develop a novel Reinforcement Learning (RL) based approach that maximizes a weighted combination of diversity reward and consistency reward.
arXiv Detail & Related papers (2023-10-23T02:27:31Z) - Discovering Policies with DOMiNO: Diversity Optimization Maintaining
Near Optimality [26.69352834457256]
We formalize the problem as a Constrained Markov Decision Process.
The objective is to find diverse policies, measured by the distance between the state occupancies of the policies in the set.
We demonstrate that the method can discover diverse and meaningful behaviors in various domains.
arXiv Detail & Related papers (2022-05-26T17:40:52Z) - Adversarial Option-Aware Hierarchical Imitation Learning [89.92994158193237]
We propose Option-GAIL, a novel method to learn skills at long horizon.
The key idea of Option-GAIL is modeling the task hierarchy by options and train the policy via generative adversarial optimization.
Experiments show that Option-GAIL outperforms other counterparts consistently across a variety of tasks.
arXiv Detail & Related papers (2021-06-10T06:42:05Z) - Discovering Diverse Nearly Optimal Policies withSuccessor Features [30.144946007098852]
In reinforcement learning, a set of diverse policies can be useful for exploration, transfer, hierarchy, and robustness.
We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features.
arXiv Detail & Related papers (2021-06-01T17:56:13Z) - Selection-Expansion: A Unifying Framework for Motion-Planning and
Diversity Search Algorithms [69.87173070473717]
We investigate the properties of two diversity search algorithms, the Novelty Search and the Goal Exploration Process algorithms.
The relation to MP algorithms reveals that the smoothness, or lack of smoothness of the mapping between the policy parameter space and the outcome space plays a key role in the search efficiency.
arXiv Detail & Related papers (2021-04-10T13:52:27Z) - Cross-Domain Facial Expression Recognition: A Unified Evaluation
Benchmark and Adversarial Graph Learning [85.6386289476598]
We develop a novel adversarial graph representation adaptation (AGRA) framework for cross-domain holistic-local feature co-adaptation.
We conduct extensive and fair evaluations on several popular benchmarks and show that the proposed AGRA framework outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2020-08-03T15:00:31Z) - Spectrum-Guided Adversarial Disparity Learning [52.293230153385124]
We propose a novel end-to-end knowledge directed adversarial learning framework.
It portrays the class-conditioned intraclass disparity using two competitive encoding distributions and learns the purified latent codes by denoising learned disparity.
The experiments on four HAR benchmark datasets demonstrate the robustness and generalization of our proposed methods over a set of state-of-the-art.
arXiv Detail & Related papers (2020-07-14T05:46:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.