Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise
Rollouts
- URL: http://arxiv.org/abs/2105.03363v1
- Date: Fri, 7 May 2021 16:20:22 GMT
- Title: Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise
Rollouts
- Authors: Weinan Zhang, Xihuai Wang, Jian Shen, Ming Zhou
- Abstract summary: This paper investigates the model-based methods in multi-agent reinforcement learning (MARL)
We propose a novel decentralized model-based MARL method, named Adaptive Opponent-wise Rollout Policy (AORPO)
- Score: 52.844741540236285
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper investigates the model-based methods in multi-agent reinforcement
learning (MARL). We specify the dynamics sample complexity and the opponent
sample complexity in MARL, and conduct a theoretic analysis of return
discrepancy upper bound. To reduce the upper bound with the intention of low
sample complexity during the whole learning process, we propose a novel
decentralized model-based MARL method, named Adaptive Opponent-wise Rollout
Policy Optimization (AORPO). In AORPO, each agent builds its multi-agent
environment model, consisting of a dynamics model and multiple opponent models,
and trains its policy with the adaptive opponent-wise rollout. We further prove
the theoretic convergence of AORPO under reasonable assumptions. Empirical
experiments on competitive and cooperative tasks demonstrate that AORPO can
achieve improved sample efficiency with comparable asymptotic performance over
the compared MARL methods.
Related papers
- Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm.
HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies.
HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z) - Expensive Multi-Objective Bayesian Optimization Based on Diffusion Models [17.19004913553654]
Multi-objective Bayesian optimization (MOBO) has shown promising performance on various expensive multi-objective optimization problems (EMOPs)
We propose a novel Composite Diffusion Model based Pareto Set Learning algorithm, namely CDM-PSL, for expensive MOBO.
Our proposed algorithm attains superior performance compared with various state-of-the-art MOBO algorithms.
arXiv Detail & Related papers (2024-05-14T14:55:57Z) - Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL [57.745700271150454]
We study the sample complexity of reinforcement learning in Mean-Field Games (MFGs) with model-based function approximation.
We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity.
arXiv Detail & Related papers (2024-02-08T14:54:47Z) - Sample-Efficient Multi-Agent RL: An Optimization Perspective [103.35353196535544]
We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under the general function approximation.
We introduce a novel complexity measure called the Multi-Agent Decoupling Coefficient (MADC) for general-sum MGs.
We show that our algorithm provides comparable sublinear regret to the existing works.
arXiv Detail & Related papers (2023-10-10T01:39:04Z) - Learning Multiple Coordinated Agents under Directed Acyclic Graph
Constraints [20.45657219304883]
This paper proposes a novel multi-agent reinforcement learning (MARL) method to learn multiple coordinated agents under directed acyclic graph (DAG) constraints.
Unlike existing MARL approaches, our method explicitly exploits the DAG structure between agents to achieve more effective learning performance.
arXiv Detail & Related papers (2023-07-13T13:41:24Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Permutation Invariant Policy Optimization for Mean-Field Multi-Agent
Reinforcement Learning: A Principled Approach [128.62787284435007]
We propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.
We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence.
In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors.
arXiv Detail & Related papers (2021-05-18T04:35:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.