Logical Team Q-learning: An approach towards factored policies in
cooperative MARL
- URL: http://arxiv.org/abs/2006.03553v2
- Date: Sun, 28 Mar 2021 19:05:57 GMT
- Title: Logical Team Q-learning: An approach towards factored policies in
cooperative MARL
- Authors: Lucas Cassano and Ali H. Sayed
- Abstract summary: We address the challenge of learning factored policies in cooperative MARL scenarios.
The goal is to obtain factored policies that determine the individual behavior of each agent so that the resulting joint policy is optimal.
The main contribution is the introduction of Logical Team Q-learning (LTQL)
- Score: 49.08389593076099
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the challenge of learning factored policies in cooperative MARL
scenarios. In particular, we consider the situation in which a team of agents
collaborates to optimize a common cost. The goal is to obtain factored policies
that determine the individual behavior of each agent so that the resulting
joint policy is optimal. The main contribution of this work is the introduction
of Logical Team Q-learning (LTQL). LTQL does not rely on assumptions about the
environment and hence is generally applicable to any collaborative MARL
scenario. We derive LTQL as a stochastic approximation to a dynamic programming
method we introduce in this work. We conclude the paper by providing
experiments (both in the tabular and deep settings) that illustrate the claims.
Related papers
- CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing [70.25689961697523]
We propose a generalizable algorithm that enhances sequential reasoning by cross-task experience sharing and selection.
Our work bridges the gap between existing sequential reasoning paradigms and validates the effectiveness of leveraging cross-task experiences.
arXiv Detail & Related papers (2024-10-22T03:59:53Z) - Meta Reasoning for Large Language Models [58.87183757029041]
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs)
MRP guides LLMs to dynamically select and apply different reasoning methods based on the specific requirements of each task.
We evaluate the effectiveness of MRP through comprehensive benchmarks.
arXiv Detail & Related papers (2024-06-17T16:14:11Z) - QFree: A Universal Value Function Factorization for Multi-Agent
Reinforcement Learning [2.287186762346021]
We propose QFree, a universal value function factorization method for multi-agent reinforcement learning.
We show that QFree achieves the state-of-the-art performance in a general-purpose complex MARL benchmark environment.
arXiv Detail & Related papers (2023-11-01T08:07:16Z) - Context-Aware Bayesian Network Actor-Critic Methods for Cooperative
Multi-Agent Reinforcement Learning [7.784991832712813]
We introduce a Bayesian network to inaugurate correlations between agents' action selections in their joint policy.
We develop practical algorithms to learn the context-aware Bayesian network policies.
Empirical results on a range of MARL benchmarks show the benefits of our approach.
arXiv Detail & Related papers (2023-06-02T21:22:27Z) - Multi-Task Off-Policy Learning from Bandit Feedback [54.96011624223482]
We propose a hierarchical off-policy optimization algorithm (HierOPO), which estimates the parameters of the hierarchical model and then acts pessimistically with respect to them.
We prove per-task bounds on the suboptimality of the learned policies, which show a clear improvement over not using the hierarchical model.
Our theoretical and empirical results show a clear advantage of using the hierarchy over solving each task independently.
arXiv Detail & Related papers (2022-12-09T08:26:27Z) - Revisiting Some Common Practices in Cooperative Multi-Agent
Reinforcement Learning [11.91425153754564]
We show that in environments with a highly multi-modal reward landscape, value decomposition, and parameter sharing can be problematic and lead to undesired outcomes.
In contrast, policy gradient (PG) methods with individual policies provably converge to an optimal solution in these cases.
We present practical suggestions on implementing multi-agent PG algorithms for either high rewards or diverse emergent behaviors.
arXiv Detail & Related papers (2022-06-15T13:03:05Z) - Q-Mixing Network for Multi-Agent Pathfinding in Partially Observable
Grid Environments [62.997667081978825]
We consider the problem of multi-agent navigation in partially observable grid environments.
We suggest utilizing the reinforcement learning approach when the agents, first, learn the policies that map observations to actions and then follow these policies to reach their goals.
arXiv Detail & Related papers (2021-08-13T09:44:47Z) - Policy Information Capacity: Information-Theoretic Measure for Task
Complexity in Deep Reinforcement Learning [83.66080019570461]
We propose two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty.
We show that these metrics have higher correlations with normalized task solvability scores than a variety of alternatives.
These metrics can also be used for fast and compute-efficient optimizations of key design parameters.
arXiv Detail & Related papers (2021-03-23T17:49:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.