Discovering Diverse Nearly Optimal Policies withSuccessor Features
- URL: http://arxiv.org/abs/2106.00669v1
- Date: Tue, 1 Jun 2021 17:56:13 GMT
- Title: Discovering Diverse Nearly Optimal Policies withSuccessor Features
- Authors: Tom Zahavy, Brendan O'Donoghue, Andre Barreto, Volodymyr Mnih,
Sebastian Flennerhag and Satinder Singh
- Abstract summary: In reinforcement learning, a set of diverse policies can be useful for exploration, transfer, hierarchy, and robustness.
We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features.
- Score: 30.144946007098852
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Finding different solutions to the same problem is a key aspect of
intelligence associated with creativity and adaptation to novel situations. In
reinforcement learning, a set of diverse policies can be useful for
exploration, transfer, hierarchy, and robustness. We propose Diverse Successive
Policies, a method for discovering policies that are diverse in the space of
Successor Features, while assuring that they are near optimal. We formalize the
problem as a Constrained Markov Decision Process (CMDP) where the goal is to
find policies that maximize diversity, characterized by an intrinsic diversity
reward, while remaining near-optimal with respect to the extrinsic reward of
the MDP. We also analyze how recently proposed robustness and discrimination
rewards perform and find that they are sensitive to the initialization of the
procedure and may converge to sub-optimal solutions. To alleviate this, we
propose new explicit diversity rewards that aim to minimize the correlation
between the Successor Features of the policies in the set. We compare the
different diversity mechanisms in the DeepMind Control Suite and find that the
type of explicit diversity we are proposing is important to discover distinct
behavior, like for example different locomotion patterns.
Related papers
- Iteratively Learn Diverse Strategies with State Distance Information [18.509323383456707]
In complex reinforcement learning problems, policies with similar rewards may have substantially different behaviors.
We develop a novel diversity-driven RL algorithm, State-based Intrinsic-reward Policy Optimization (SIPO), with provable convergence properties.
arXiv Detail & Related papers (2023-10-23T02:41:34Z) - Policy Diversity for Cooperative Agents [8.689289576285095]
Multi-agent reinforcement learning aims to find the optimal team cooperative policy to complete a task.
There may exist multiple different ways of cooperating, which usually are very needed by domain experts.
Unfortunately, there is a general lack of effective policy diversity approaches specifically designed for the multi-agent domain.
arXiv Detail & Related papers (2023-08-28T05:23:16Z) - Policy Dispersion in Non-Markovian Environment [53.05904889617441]
This paper tries to learn the diverse policies from the history of state-action pairs under a non-Markovian environment.
We first adopt a transformer-based method to learn policy embeddings.
Then, we stack the policy embeddings to construct a dispersion matrix to induce a set of diverse policies.
arXiv Detail & Related papers (2023-02-28T11:58:39Z) - A Unified Algorithm Framework for Unsupervised Discovery of Skills based
on Determinantal Point Process [53.86223883060367]
We show that diversity and coverage in unsupervised option discovery can indeed be unified under the same mathematical framework.
Our proposed algorithm, ODPP, has undergone extensive evaluation on challenging tasks created with Mujoco and Atari.
arXiv Detail & Related papers (2022-12-01T01:40:03Z) - DGPO: Discovering Multiple Strategies with Diversity-Guided Policy
Optimization [34.40615558867965]
We propose an on-policy algorithm that discovers multiple strategies for solving a given task.
Unlike prior work, it achieves this with a shared policy network trained over a single run.
Experimental results show that our method efficiently discovers diverse strategies in a wide variety of reinforcement learning tasks.
arXiv Detail & Related papers (2022-07-12T15:57:55Z) - Discovering Policies with DOMiNO: Diversity Optimization Maintaining
Near Optimality [26.69352834457256]
We formalize the problem as a Constrained Markov Decision Process.
The objective is to find diverse policies, measured by the distance between the state occupancies of the policies in the set.
We demonstrate that the method can discover diverse and meaningful behaviors in various domains.
arXiv Detail & Related papers (2022-05-26T17:40:52Z) - CAMEO: Curiosity Augmented Metropolis for Exploratory Optimal Policies [62.39667564455059]
We consider and study a distribution of optimal policies.
In experimental simulations we show that CAMEO indeed obtains policies that all solve classic control problems.
We further show that the different policies we sample present different risk profiles, corresponding to interesting practical applications in interpretability.
arXiv Detail & Related papers (2022-05-19T09:48:56Z) - Selection-Expansion: A Unifying Framework for Motion-Planning and
Diversity Search Algorithms [69.87173070473717]
We investigate the properties of two diversity search algorithms, the Novelty Search and the Goal Exploration Process algorithms.
The relation to MP algorithms reveals that the smoothness, or lack of smoothness of the mapping between the policy parameter space and the outcome space plays a key role in the search efficiency.
arXiv Detail & Related papers (2021-04-10T13:52:27Z) - Novel Policy Seeking with Constrained Optimization [131.67409598529287]
We propose to rethink the problem of generating novel policies in reinforcement learning tasks.
We first introduce a new metric to evaluate the difference between policies and then design two practical novel policy generation methods.
The two proposed methods, namely the Constrained Task Novel Bisector (CTNB) and the Interior Policy Differentiation (IPD), are derived from the feasible direction method and the interior point method commonly known in the constrained optimization literature.
arXiv Detail & Related papers (2020-05-21T14:39:14Z) - Variational Policy Propagation for Multi-agent Reinforcement Learning [68.26579560607597]
We propose a emphcollaborative multi-agent reinforcement learning algorithm named variational policy propagation (VPP) to learn a emphjoint policy through the interactions over agents.
We prove that the joint policy is a Markov Random Field under some mild conditions, which in turn reduces the policy space effectively.
We integrate the variational inference as special differentiable layers in policy such as the actions can be efficiently sampled from the Markov Random Field and the overall policy is differentiable.
arXiv Detail & Related papers (2020-04-19T15:42:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.