Related papers: Logical Team Q-learning: An approach towards factored policies in cooperative MARL

Logical Team Q-learning: An approach towards factored policies in cooperative MARL

URL: http://arxiv.org/abs/2006.03553v2
Date: Sun, 28 Mar 2021 19:05:57 GMT
Title: Logical Team Q-learning: An approach towards factored policies in cooperative MARL
Authors: Lucas Cassano and Ali H. Sayed
Abstract summary: We address the challenge of learning factored policies in cooperative MARL scenarios. The goal is to obtain factored policies that determine the individual behavior of each agent so that the resulting joint policy is optimal. The main contribution is the introduction of Logical Team Q-learning (LTQL)
Score: 49.08389593076099
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We address the challenge of learning factored policies in cooperative MARL scenarios. In particular, we consider the situation in which a team of agents collaborates to optimize a common cost. The goal is to obtain factored policies that determine the individual behavior of each agent so that the resulting joint policy is optimal. The main contribution of this work is the introduction of Logical Team Q-learning (LTQL). LTQL does not rely on assumptions about the environment and hence is generally applicable to any collaborative MARL scenario. We derive LTQL as a stochastic approximation to a dynamic programming method we introduce in this work. We conclude the paper by providing experiments (both in the tabular and deep settings) that illustrate the claims.

Related papers

Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models. Controlled Decoding provides a mechanism for aligning a model at inference time without retraining. We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z)
ARIES: Autonomous Reasoning with LLMs on Interactive Thought Graph Environments [7.508204100423766]
We introduce ARIES, a multi-agent architecture for reasoning with LLMs. We observe that using off-the-shelf LLMs as policy agents with no supervised fine-tuning (SFT) can yield up to $29%$ higher accuracy on HumanEval. We also conduct a thorough analysis of observed failure modes, highlighting that limitations on LLM sizes and the depth of problem decomposition can be seen as challenges to scaling LLM-guided reasoning.
arXiv Detail & Related papers (2025-02-28T16:28:13Z)
CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing [70.25689961697523]
We propose a generalizable algorithm that enhances sequential reasoning by cross-task experience sharing and selection. Our work bridges the gap between existing sequential reasoning paradigms and validates the effectiveness of leveraging cross-task experiences.
arXiv Detail & Related papers (2024-10-22T03:59:53Z)
Meta Reasoning for Large Language Models [58.87183757029041]
We introduce Meta-Reasoning Prompting (MRP), a novel and efficient system prompting method for large language models (LLMs) MRP guides LLMs to dynamically select and apply different reasoning methods based on the specific requirements of each task. We evaluate the effectiveness of MRP through comprehensive benchmarks.
arXiv Detail & Related papers (2024-06-17T16:14:11Z)
QFree: A Universal Value Function Factorization for Multi-Agent Reinforcement Learning [2.287186762346021]
We propose QFree, a universal value function factorization method for multi-agent reinforcement learning. We show that QFree achieves the state-of-the-art performance in a general-purpose complex MARL benchmark environment.
arXiv Detail & Related papers (2023-11-01T08:07:16Z)
Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning [7.784991832712813]
We introduce a Bayesian network to inaugurate correlations between agents' action selections in their joint policy. We develop practical algorithms to learn the context-aware Bayesian network policies. Empirical results on a range of MARL benchmarks show the benefits of our approach.
arXiv Detail & Related papers (2023-06-02T21:22:27Z)
Multi-Task Off-Policy Learning from Bandit Feedback [54.96011624223482]
We propose a hierarchical off-policy optimization algorithm (HierOPO), which estimates the parameters of the hierarchical model and then acts pessimistically with respect to them. We prove per-task bounds on the suboptimality of the learned policies, which show a clear improvement over not using the hierarchical model. Our theoretical and empirical results show a clear advantage of using the hierarchy over solving each task independently.
arXiv Detail & Related papers (2022-12-09T08:26:27Z)
Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning [11.91425153754564]
We show that in environments with a highly multi-modal reward landscape, value decomposition, and parameter sharing can be problematic and lead to undesired outcomes. In contrast, policy gradient (PG) methods with individual policies provably converge to an optimal solution in these cases. We present practical suggestions on implementing multi-agent PG algorithms for either high rewards or diverse emergent behaviors.
arXiv Detail & Related papers (2022-06-15T13:03:05Z)
Q-Mixing Network for Multi-Agent Pathfinding in Partially Observable Grid Environments [62.997667081978825]
We consider the problem of multi-agent navigation in partially observable grid environments. We suggest utilizing the reinforcement learning approach when the agents, first, learn the policies that map observations to actions and then follow these policies to reach their goals.
arXiv Detail & Related papers (2021-08-13T09:44:47Z)
Policy Information Capacity: Information-Theoretic Measure for Task Complexity in Deep Reinforcement Learning [83.66080019570461]
We propose two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty. We show that these metrics have higher correlations with normalized task solvability scores than a variety of alternatives. These metrics can also be used for fast and compute-efficient optimizations of key design parameters.
arXiv Detail & Related papers (2021-03-23T17:49:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.