Phasic Diversity Optimization for Population-Based Reinforcement Learning
- URL: http://arxiv.org/abs/2403.11114v1
- Date: Sun, 17 Mar 2024 06:41:09 GMT
- Title: Phasic Diversity Optimization for Population-Based Reinforcement Learning
- Authors: Jingcheng Jiang, Haiyin Piao, Yu Fu, Yihang Hao, Chuanlu Jiang, Ziqi Wei, Xin Yang,
- Abstract summary: Phasic Diversity Optimization (PDO) algorithm separates reward and diversity training into distinct phases.
In the auxiliary phase, agents with poor performance diversified via determinants will not replace the better agents in the archive.
We introduce two implementations of PDO archive and conduct tests in the newly proposed adversarial dogfight and MuJoCo simulations.
- Score: 10.15130620537703
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reviewing the previous work of diversity Rein-forcement Learning,diversity is often obtained via an augmented loss function,which requires a balance between reward and diversity.Generally,diversity optimization algorithms use Multi-armed Bandits algorithms to select the coefficient in the pre-defined space. However, the dynamic distribution of reward signals for MABs or the conflict between quality and diversity limits the performance of these methods. We introduce the Phasic Diversity Optimization (PDO) algorithm, a Population-Based Training framework that separates reward and diversity training into distinct phases instead of optimizing a multi-objective function. In the auxiliary phase, agents with poor performance diversified via determinants will not replace the better agents in the archive. The decoupling of reward and diversity allows us to use an aggressive diversity optimization in the auxiliary phase without performance degradation. Furthermore, we construct a dogfight scenario for aerial agents to demonstrate the practicality of the PDO algorithm. We introduce two implementations of PDO archive and conduct tests in the newly proposed adversarial dogfight and MuJoCo simulations. The results show that our proposed algorithm achieves better performance than baselines.
Related papers
- Comparative Analysis of Indicators for Multiobjective Diversity Optimization [0.2144088660722956]
We discuss different diversity indicators from the perspective of indicator-based evolutionary algorithms (IBEA) with multiple objectives.
We examine theoretical, computational, and practical properties of these indicators, such as monotonicity in species.
We present new theorems -- including a proof of the NP-hardness of the Riesz s-Energy Subset Selection Problem.
arXiv Detail & Related papers (2024-10-24T16:40:36Z) - UAV-enabled Collaborative Beamforming via Multi-Agent Deep Reinforcement Learning [79.16150966434299]
We formulate a UAV-enabled collaborative beamforming multi-objective optimization problem (UCBMOP) to maximize the transmission rate of the UVAA and minimize the energy consumption of all UAVs.
We use the heterogeneous-agent trust region policy optimization (HATRPO) as the basic framework, and then propose an improved HATRPO algorithm, namely HATRPO-UCB.
arXiv Detail & Related papers (2024-04-11T03:19:22Z) - Knowledge Transfer for Dynamic Multi-objective Optimization with a
Changing Number of Objectives [4.490459770205671]
We show that the state-of-the-art transfer algorithm for DMOPs with a changing number of objectives lacks sufficient diversity.
We propose a knowledge transfer dynamic multi-objective evolutionary algorithm (KTDMOEA) to enhance population diversity after changes.
arXiv Detail & Related papers (2023-06-19T01:54:44Z) - Local Optimization Achieves Global Optimality in Multi-Agent
Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z) - Source-free Domain Adaptation Requires Penalized Diversity [60.04618512479438]
Source-free domain adaptation (SFDA) was introduced to address knowledge transfer between different domains in the absence of source data.
In unsupervised SFDA, the diversity is limited to learning a single hypothesis on the source or learning multiple hypotheses with a shared feature extractor.
We propose a novel unsupervised SFDA algorithm that promotes representational diversity through the use of separate feature extractors.
arXiv Detail & Related papers (2023-04-06T00:20:19Z) - A Unified Algorithm Framework for Unsupervised Discovery of Skills based
on Determinantal Point Process [53.86223883060367]
We show that diversity and coverage in unsupervised option discovery can indeed be unified under the same mathematical framework.
Our proposed algorithm, ODPP, has undergone extensive evaluation on challenging tasks created with Mujoco and Atari.
arXiv Detail & Related papers (2022-12-01T01:40:03Z) - DGPO: Discovering Multiple Strategies with Diversity-Guided Policy
Optimization [34.40615558867965]
We propose an on-policy algorithm that discovers multiple strategies for solving a given task.
Unlike prior work, it achieves this with a shared policy network trained over a single run.
Experimental results show that our method efficiently discovers diverse strategies in a wide variety of reinforcement learning tasks.
arXiv Detail & Related papers (2022-07-12T15:57:55Z) - Multi-Objective Quality Diversity Optimization [2.4608515808275455]
We propose an extension of the MAP-Elites algorithm in the multi-objective setting: Multi-Objective MAP-Elites (MOME)
Namely, it combines the diversity inherited from the MAP-Elites grid algorithm with the strength of multi-objective optimizations.
We evaluate our method on several tasks, from standard optimization problems to robotics simulations.
arXiv Detail & Related papers (2022-02-07T10:48:28Z) - A novel multiobjective evolutionary algorithm based on decomposition and
multi-reference points strategy [14.102326122777475]
Multiobjective evolutionary algorithm based on decomposition (MOEA/D) has been regarded as a significantly promising approach for solving multiobjective optimization problems (MOPs)
We propose an improved MOEA/D algorithm by virtue of the well-known Pascoletti-Serafini scalarization method and a new strategy of multi-reference points.
arXiv Detail & Related papers (2021-10-27T02:07:08Z) - Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems.
Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC.
We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z) - Permutation Invariant Policy Optimization for Mean-Field Multi-Agent
Reinforcement Learning: A Principled Approach [128.62787284435007]
We propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.
We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence.
In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors.
arXiv Detail & Related papers (2021-05-18T04:35:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.