Related papers: Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality

Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality

URL: http://arxiv.org/abs/2205.13521v1
Date: Thu, 26 May 2022 17:40:52 GMT
Title: Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality
Authors: Tom Zahavy, Yannick Schroecker, Feryal Behbahani, Kate Baumli, Sebastian Flennerhag, Shaobo Hou and Satinder Singh
Abstract summary: We formalize the problem as a Constrained Markov Decision Process. The objective is to find diverse policies, measured by the distance between the state occupancies of the policies in the set. We demonstrate that the method can discover diverse and meaningful behaviors in various domains.
Score: 26.69352834457256
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Finding different solutions to the same problem is a key aspect of intelligence associated with creativity and adaptation to novel situations. In reinforcement learning, a set of diverse policies can be useful for exploration, transfer, hierarchy, and robustness. We propose DOMiNO, a method for Diversity Optimization Maintaining Near Optimality. We formalize the problem as a Constrained Markov Decision Process where the objective is to find diverse policies, measured by the distance between the state occupancies of the policies in the set, while remaining near-optimal with respect to the extrinsic reward. We demonstrate that the method can discover diverse and meaningful behaviors in various domains, such as different locomotion patterns in the DeepMind Control Suite. We perform extensive analysis of our approach, compare it with other multi-objective baselines, demonstrate that we can control both the quality and the diversity of the set via interpretable hyperparameters, and show that the discovered set is robust to perturbations.

Related papers

Illuminating the Diversity-Fitness Trade-Off in Black-Box Optimization [9.838618121102053]
In real-world applications, users often favor structurally diverse design choices over one high-quality solution. This paper considers the problem of identifying a fixed number of solutions with a pairwise distance above a specified threshold. We analyze how this trade-off depends on the properties of the underlying optimization problem.
arXiv Detail & Related papers (2024-08-29T09:55:55Z)
Testing for Fault Diversity in Reinforcement Learning [13.133263651395865]
We argue that policy testing should not find as many failures as possible (e.g., inputs that trigger similar car crashes) but rather aim at revealing as informative and diverse faults as possible in the model. We show that QD optimisation, while being conceptually simple and generally applicable, finds effectively more diverse faults in the decision model.
arXiv Detail & Related papers (2024-03-22T09:46:30Z)
Iteratively Learn Diverse Strategies with State Distance Information [18.509323383456707]
In complex reinforcement learning problems, policies with similar rewards may have substantially different behaviors. We develop a novel diversity-driven RL algorithm, State-based Intrinsic-reward Policy Optimization (SIPO), with provable convergence properties.
arXiv Detail & Related papers (2023-10-23T02:41:34Z)
Cross-Domain Policy Adaptation via Value-Guided Data Filtering [57.62692881606099]
Generalizing policies across different domains with dynamics mismatch poses a significant challenge in reinforcement learning. We present the Value-Guided Data Filtering (VGDF) algorithm, which selectively shares transitions from the source domain based on the proximity of paired value targets.
arXiv Detail & Related papers (2023-05-28T04:08:40Z)
A Unified Algorithm Framework for Unsupervised Discovery of Skills based on Determinantal Point Process [53.86223883060367]
We show that diversity and coverage in unsupervised option discovery can indeed be unified under the same mathematical framework. Our proposed algorithm, ODPP, has undergone extensive evaluation on challenging tasks created with Mujoco and Atari.
arXiv Detail & Related papers (2022-12-01T01:40:03Z)
DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization [34.40615558867965]
We propose an on-policy algorithm that discovers multiple strategies for solving a given task. Unlike prior work, it achieves this with a shared policy network trained over a single run. Experimental results show that our method efficiently discovers diverse strategies in a wide variety of reinforcement learning tasks.
arXiv Detail & Related papers (2022-07-12T15:57:55Z)
CAMEO: Curiosity Augmented Metropolis for Exploratory Optimal Policies [62.39667564455059]
We consider and study a distribution of optimal policies. In experimental simulations we show that CAMEO indeed obtains policies that all solve classic control problems. We further show that the different policies we sample present different risk profiles, corresponding to interesting practical applications in interpretability.
arXiv Detail & Related papers (2022-05-19T09:48:56Z)
Discovering Diverse Nearly Optimal Policies withSuccessor Features [30.144946007098852]
In reinforcement learning, a set of diverse policies can be useful for exploration, transfer, hierarchy, and robustness. We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features.
arXiv Detail & Related papers (2021-06-01T17:56:13Z)
An Analysis of Phenotypic Diversity in Multi-Solution Optimization [118.97353274202749]
We show that multiobjective optimization does not always produce much diversity, multimodal optimization produces higher fitness solutions, and quality diversity is not sensitive to genetic neutrality. An autoencoder is used to discover phenotypic features automatically, producing an even more diverse solution set with quality diversity.
arXiv Detail & Related papers (2021-05-10T10:39:03Z)
Discovering Diverse Solutions in Deep Reinforcement Learning [84.45686627019408]
Reinforcement learning algorithms are typically limited to learning a single solution of a specified task. We propose an RL method that can learn infinitely many solutions by training a policy conditioned on a continuous or discrete low-dimensional latent variable.
arXiv Detail & Related papers (2021-03-12T04:54:31Z)
Variational Policy Propagation for Multi-agent Reinforcement Learning [68.26579560607597]
We propose a emphcollaborative multi-agent reinforcement learning algorithm named variational policy propagation (VPP) to learn a emphjoint policy through the interactions over agents. We prove that the joint policy is a Markov Random Field under some mild conditions, which in turn reduces the policy space effectively. We integrate the variational inference as special differentiable layers in policy such as the actions can be efficiently sampled from the Markov Random Field and the overall policy is differentiable.
arXiv Detail & Related papers (2020-04-19T15:42:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.