Iteratively Learn Diverse Strategies with State Distance Information
- URL: http://arxiv.org/abs/2310.14509v1
- Date: Mon, 23 Oct 2023 02:41:34 GMT
- Title: Iteratively Learn Diverse Strategies with State Distance Information
- Authors: Wei Fu, Weihua Du, Jingwei Li, Sunli Chen, Jingzhao Zhang, Yi Wu
- Abstract summary: In complex reinforcement learning problems, policies with similar rewards may have substantially different behaviors.
We develop a novel diversity-driven RL algorithm, State-based Intrinsic-reward Policy Optimization (SIPO), with provable convergence properties.
- Score: 18.509323383456707
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In complex reinforcement learning (RL) problems, policies with similar
rewards may have substantially different behaviors. It remains a fundamental
challenge to optimize rewards while also discovering as many diverse strategies
as possible, which can be crucial in many practical applications. Our study
examines two design choices for tackling this challenge, i.e., diversity
measure and computation framework. First, we find that with existing diversity
measures, visually indistinguishable policies can still yield high diversity
scores. To accurately capture the behavioral difference, we propose to
incorporate the state-space distance information into the diversity measure. In
addition, we examine two common computation frameworks for this problem, i.e.,
population-based training (PBT) and iterative learning (ITR). We show that
although PBT is the precise problem formulation, ITR can achieve comparable
diversity scores with higher computation efficiency, leading to improved
solution quality in practice. Based on our analysis, we further combine ITR
with two tractable realizations of the state-distance-based diversity measures
and develop a novel diversity-driven RL algorithm, State-based Intrinsic-reward
Policy Optimization (SIPO), with provable convergence properties. We
empirically examine SIPO across three domains from robot locomotion to
multi-agent games. In all of our testing environments, SIPO consistently
produces strategically diverse and human-interpretable policies that cannot be
discovered by existing baselines.
Related papers
- Diversity Through Exclusion (DTE): Niche Identification for
Reinforcement Learning through Value-Decomposition [63.67574523750839]
We propose a generic reinforcement learning (RL) algorithm that performs better than baseline deep Q-learning algorithms in environments with multiple variably-valued niches.
We show that agents trained this way can escape poor-but-attractive local optima to instead converge to harder-to-discover higher value strategies.
arXiv Detail & Related papers (2023-02-02T16:00:19Z) - A Unified Algorithm Framework for Unsupervised Discovery of Skills based
on Determinantal Point Process [53.86223883060367]
We show that diversity and coverage in unsupervised option discovery can indeed be unified under the same mathematical framework.
Our proposed algorithm, ODPP, has undergone extensive evaluation on challenging tasks created with Mujoco and Atari.
arXiv Detail & Related papers (2022-12-01T01:40:03Z) - Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent
RL [107.58821842920393]
We quantify the agent's behavior difference and build its relationship with the policy performance via bf Role Diversity
We find that the error bound in MARL can be decomposed into three parts that have a strong relation to the role diversity.
The decomposed factors can significantly impact policy optimization on three popular directions.
arXiv Detail & Related papers (2022-06-01T04:58:52Z) - Discovering Policies with DOMiNO: Diversity Optimization Maintaining
Near Optimality [26.69352834457256]
We formalize the problem as a Constrained Markov Decision Process.
The objective is to find diverse policies, measured by the distance between the state occupancies of the policies in the set.
We demonstrate that the method can discover diverse and meaningful behaviors in various domains.
arXiv Detail & Related papers (2022-05-26T17:40:52Z) - Robust Policy Learning over Multiple Uncertainty Sets [91.67120465453179]
Reinforcement learning (RL) agents need to be robust to variations in safety-critical environments.
We develop an algorithm that enjoys the benefits of both system identification and robust RL.
arXiv Detail & Related papers (2022-02-14T20:06:28Z) - Unifying Behavioral and Response Diversity for Open-ended Learning in
Zero-sum Games [44.30509625560908]
In open-ended learning algorithms, there are no widely accepted definitions for diversity, making it hard to construct and evaluate the diverse policies.
We propose a unified measure of diversity in multi-agent open-ended learning based on both Behavioral Diversity (BD) and Response Diversity (RD)
We show that many current diversity measures fall in one of the categories of BD or RD but not both.
With this unified diversity measure, we design the corresponding diversity-promoting objective and population effectivity when seeking the best responses in open-ended learning.
arXiv Detail & Related papers (2021-06-09T10:11:06Z) - Discovering Diverse Nearly Optimal Policies withSuccessor Features [30.144946007098852]
In reinforcement learning, a set of diverse policies can be useful for exploration, transfer, hierarchy, and robustness.
We propose Diverse Successive Policies, a method for discovering policies that are diverse in the space of Successor Features.
arXiv Detail & Related papers (2021-06-01T17:56:13Z) - Probabilistic Mixture-of-Experts for Efficient Deep Reinforcement
Learning [7.020079427649125]
We show that grasping distinguishable skills for some tasks with non-unique optima can be essential for further improving its learning efficiency and performance.
We propose a probabilistic mixture-of-experts (PMOE) for multimodal policy, together with a novel gradient estimator for the indifferentiability problem.
arXiv Detail & Related papers (2021-04-19T08:21:56Z) - Policy Information Capacity: Information-Theoretic Measure for Task
Complexity in Deep Reinforcement Learning [83.66080019570461]
We propose two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty.
We show that these metrics have higher correlations with normalized task solvability scores than a variety of alternatives.
These metrics can also be used for fast and compute-efficient optimizations of key design parameters.
arXiv Detail & Related papers (2021-03-23T17:49:50Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.