Best Possible Q-Learning
- URL: http://arxiv.org/abs/2302.01188v1
- Date: Thu, 2 Feb 2023 16:14:19 GMT
- Title: Best Possible Q-Learning
- Authors: Jiechuan Jiang and Zongqing Lu
- Abstract summary: Decentralized learning is a challenge in cooperative multi-agent reinforcement learning.
convergence and optimality of most decentralized algorithms are not theoretically guaranteed.
We show that best possible Q-learning achieves remarkable improvement over baselines in a variety of cooperative multi-agent tasks.
- Score: 33.4713690991284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fully decentralized learning, where the global information, i.e., the actions
of other agents, is inaccessible, is a fundamental challenge in cooperative
multi-agent reinforcement learning. However, the convergence and optimality of
most decentralized algorithms are not theoretically guaranteed, since the
transition probabilities are non-stationary as all agents are updating policies
simultaneously. To tackle this challenge, we propose best possible operator, a
novel decentralized operator, and prove that the policies of agents will
converge to the optimal joint policy if each agent independently updates its
individual state-action value by the operator. Further, to make the update more
efficient and practical, we simplify the operator and prove that the
convergence and optimality still hold with the simplified one. By instantiating
the simplified operator, the derived fully decentralized algorithm, best
possible Q-learning (BQL), does not suffer from non-stationarity. Empirically,
we show that BQL achieves remarkable improvement over baselines in a variety of
cooperative multi-agent tasks.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Decentralized and Lifelong-Adaptive Multi-Agent Collaborative Learning [57.652899266553035]
Decentralized and lifelong-adaptive multi-agent collaborative learning aims to enhance collaboration among multiple agents without a central server.
We propose DeLAMA, a decentralized multi-agent lifelong collaborative learning algorithm with dynamic collaboration graphs.
arXiv Detail & Related papers (2024-03-11T09:21:11Z) - Fully Decentralized Cooperative Multi-Agent Reinforcement Learning: A
Survey [48.77342627610471]
Cooperative multi-agent reinforcement learning is a powerful tool to solve many real-world cooperative tasks.
It is challenging to derive algorithms that can converge to the optimal joint policy in a fully decentralized setting.
arXiv Detail & Related papers (2024-01-10T05:07:42Z) - Graph Exploration for Effective Multi-agent Q-Learning [46.723361065955544]
This paper proposes an exploration technique for multi-agent reinforcement learning (MARL) with graph-based communication among agents.
We assume the individual rewards received by the agents are independent of the actions by the other agents, while their policies are coupled.
In the proposed framework, neighbouring agents collaborate to estimate the uncertainty about the state-action space in order to execute more efficient explorative behaviour.
arXiv Detail & Related papers (2023-04-19T10:28:28Z) - Multi-agent Policy Reciprocity with Theoretical Guarantee [24.65151626601257]
We propose a novel multi-agent policy reciprocity (PR) framework, where each agent can fully exploit cross-agent policies even in mismatched states.
Experimental results on discrete and continuous environments demonstrate that PR outperforms various existing RL and transfer RL methods.
arXiv Detail & Related papers (2023-04-12T06:27:10Z) - Iterated Reasoning with Mutual Information in Cooperative and Byzantine
Decentralized Teaming [0.0]
We show that reformulating an agent's policy to be conditional on the policies of its teammates inherently maximizes Mutual Information (MI) lower-bound when optimizing under Policy Gradient (PG)
Our approach, InfoPG, outperforms baselines in learning emergent collaborative behaviors and sets the state-of-the-art in decentralized cooperative MARL tasks.
arXiv Detail & Related papers (2022-01-20T22:54:32Z) - Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints.
We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z) - Offline Decentralized Multi-Agent Reinforcement Learning [33.4713690991284]
We propose a framework for offline decentralized multi-agent reinforcement learning.
We exploit value deviation and transition normalization to modify the transition probabilities.
We show that the framework can be easily built on many existing offline reinforcement learning algorithms.
arXiv Detail & Related papers (2021-08-04T03:53:33Z) - Dif-MAML: Decentralized Multi-Agent Meta-Learning [54.39661018886268]
We propose a cooperative multi-agent meta-learning algorithm, referred to as MAML or Dif-MAML.
We show that the proposed strategy allows a collection of agents to attain agreement at a linear rate and to converge to a stationary point of the aggregate MAML.
Simulation results illustrate the theoretical findings and the superior performance relative to the traditional non-cooperative setting.
arXiv Detail & Related papers (2020-10-06T16:51:09Z) - F2A2: Flexible Fully-decentralized Approximate Actor-critic for
Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications.
We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting.
Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.