Multi-agent Policy Reciprocity with Theoretical Guarantee
- URL: http://arxiv.org/abs/2304.05632v1
- Date: Wed, 12 Apr 2023 06:27:10 GMT
- Title: Multi-agent Policy Reciprocity with Theoretical Guarantee
- Authors: Haozhi Wang, Yinchuan Li, Qing Wang, Yunfeng Shao, Jianye Hao
- Abstract summary: We propose a novel multi-agent policy reciprocity (PR) framework, where each agent can fully exploit cross-agent policies even in mismatched states.
Experimental results on discrete and continuous environments demonstrate that PR outperforms various existing RL and transfer RL methods.
- Score: 24.65151626601257
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern multi-agent reinforcement learning (RL) algorithms hold great
potential for solving a variety of real-world problems. However, they do not
fully exploit cross-agent knowledge to reduce sample complexity and improve
performance. Although transfer RL supports knowledge sharing, it is
hyperparameter sensitive and complex. To solve this problem, we propose a novel
multi-agent policy reciprocity (PR) framework, where each agent can fully
exploit cross-agent policies even in mismatched states. We then define an
adjacency space for mismatched states and design a plug-and-play module for
value iteration, which enables agents to infer more precise returns. To improve
the scalability of PR, deep PR is proposed for continuous control tasks.
Moreover, theoretical analysis shows that agents can asymptotically reach
consensus through individual perceived rewards and converge to an optimal value
function, which implies the stability and effectiveness of PR, respectively.
Experimental results on discrete and continuous environments demonstrate that
PR outperforms various existing RL and transfer RL methods.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - REBEL: Reinforcement Learning via Regressing Relative Rewards [59.68420022466047]
We propose REBEL, a minimalist RL algorithm for the era of generative models.
In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL.
We find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO.
arXiv Detail & Related papers (2024-04-25T17:20:45Z) - Finite-Time Analysis of On-Policy Heterogeneous Federated Reinforcement Learning [8.632943870358627]
Federated reinforcement learning (FRL) has emerged as a promising paradigm for reducing the sample complexity of reinforcement learning tasks.
We introduce FedSARSA, a novel on-policy reinforcement learning scheme equipped with linear function approximation.
We show that FedSARSA converges to a policy that is near-optimal for all agents, with the extent of near-optimality proportional to the level of heterogeneity.
arXiv Detail & Related papers (2024-01-27T02:43:45Z) - Deep Multi-Agent Reinforcement Learning for Decentralized Active
Hypothesis Testing [11.639503711252663]
We tackle the multi-agent active hypothesis testing (AHT) problem by introducing a novel algorithm rooted in the framework of deep multi-agent reinforcement learning.
We present a comprehensive set of experimental results that effectively showcase the agents' ability to learn collaborative strategies and enhance performance.
arXiv Detail & Related papers (2023-09-14T01:18:04Z) - Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games.
We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method.
Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z) - Learning Cooperative Multi-Agent Policies with Partial Reward Decoupling [13.915157044948364]
One of the preeminent obstacles to scaling multi-agent reinforcement learning is assigning credit to individual agents' actions.
In this paper, we address this credit assignment problem with an approach that we call textitpartial reward decoupling (PRD)
PRD decomposes large cooperative multi-agent RL problems into decoupled subproblems involving subsets of agents, thereby simplifying credit assignment.
arXiv Detail & Related papers (2021-12-23T17:48:04Z) - Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement
Learning with Actor Rectification [74.10976684469435]
offline reinforcement learning (RL) algorithms can be transferred to multi-agent settings directly.
We propose a simple yet effective method, Offline Multi-Agent RL with Actor Rectification (OMAR), to tackle this critical challenge.
OMAR significantly outperforms strong baselines with state-of-the-art performance in multi-agent continuous control benchmarks.
arXiv Detail & Related papers (2021-11-22T13:27:42Z) - Trust Region Policy Optimisation in Multi-Agent Reinforcement Learning [25.027143431992755]
Trust region methods rigorously enabled reinforcement learning (RL) agents to learn monotonically improving policies, leading to superior performance on a variety of tasks.
Unfortunately, when it comes to multi-agent reinforcement learning (MARL), the property of monotonic improvement may not simply apply.
In this paper, we extend the theory of trust region learning to MARL. Central to our findings are the multi-agent advantage decomposition lemma and the sequential policy update scheme.
Based on these, we develop Heterogeneous-Agent Trust Region Policy optimisation (HATPRO) and Heterogeneous-Agent Proximal Policy optimisation (
arXiv Detail & Related papers (2021-09-23T09:44:35Z) - Combining Pessimism with Optimism for Robust and Efficient Model-Based
Deep Reinforcement Learning [56.17667147101263]
In real-world tasks, reinforcement learning agents encounter situations that are not present during training time.
To ensure reliable performance, the RL agents need to exhibit robustness against worst-case situations.
We propose the Robust Hallucinated Upper-Confidence RL (RH-UCRL) algorithm to provably solve this problem.
arXiv Detail & Related papers (2021-03-18T16:50:17Z) - Multi-Agent Trust Region Policy Optimization [34.91180300856614]
We show that the policy update of TRPO can be transformed into a distributed consensus optimization problem for multi-agent cases.
We propose a decentralized MARL algorithm, which we call multi-agent TRPO (MATRPO)
arXiv Detail & Related papers (2020-10-15T17:49:47Z) - Scalable Multi-Agent Inverse Reinforcement Learning via
Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems.
We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.