Multi-Agent Advisor Q-Learning
- URL: http://arxiv.org/abs/2111.00345v1
- Date: Tue, 26 Oct 2021 00:21:15 GMT
- Title: Multi-Agent Advisor Q-Learning
- Authors: Sriram Ganapathi Subramanian, Matthew E. Taylor, Kate Larson, Mark
Crowley
- Abstract summary: We provide a principled framework for incorporating action recommendations from online sub-optimal advisors in multi-agent settings.
We present two novel Q-learning based algorithms: ADMIRAL - Decision Making (ADMIRAL-DM) and ADMIRAL - Advisor Evaluation (ADMIRAL-AE)
We analyze the algorithms theoretically and provide fixed-point guarantees regarding their learning in general-sum games.
- Score: 18.8931184962221
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the last decade, there have been significant advances in multi-agent
reinforcement learning (MARL) but there are still numerous challenges, such as
high sample complexity and slow convergence to stable policies, that need to be
overcome before wide-spread deployment is possible. However, many real-world
environments already, in practice, deploy sub-optimal or heuristic approaches
for generating policies. An interesting question which arises is how to best
use such approaches as advisors to help improve reinforcement learning in
multi-agent domains. In this paper, we provide a principled framework for
incorporating action recommendations from online sub-optimal advisors in
multi-agent settings. We describe the problem of ADvising Multiple Intelligent
Reinforcement Agents (ADMIRAL) in nonrestrictive general-sum stochastic game
environments and present two novel Q-learning based algorithms: ADMIRAL -
Decision Making (ADMIRAL-DM) and ADMIRAL - Advisor Evaluation (ADMIRAL-AE),
which allow us to improve learning by appropriately incorporating advice from
an advisor (ADMIRAL-DM), and evaluate the effectiveness of an advisor
(ADMIRAL-AE). We analyze the algorithms theoretically and provide fixed-point
guarantees regarding their learning in general-sum stochastic games.
Furthermore, extensive experiments illustrate that these algorithms: can be
used in a variety of environments, have performances that compare favourably to
other related baselines, can scale to large state-action spaces, and are robust
to poor advice from advisors.
Related papers
- Mitigating Relative Over-Generalization in Multi-Agent Reinforcement Learning [11.988291170853806]
We introduce MaxMax Q-Learning (MMQ), which employs an iterative process of sampling and evaluating potential next states.
This approach refines approximations of ideal state transitions, aligning more closely with the optimal joint policy of collaborating agents.
Our results demonstrate that MMQ frequently outperforms existing baselines, exhibiting enhanced convergence and sample efficiency.
arXiv Detail & Related papers (2024-11-17T15:00:39Z) - From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - POGEMA: A Benchmark Platform for Cooperative Multi-Agent Navigation [76.67608003501479]
We introduce and specify an evaluation protocol defining a range of domain-related metrics computed on the basics of the primary evaluation indicators.
The results of such a comparison, which involves a variety of state-of-the-art MARL, search-based, and hybrid methods, are presented.
arXiv Detail & Related papers (2024-07-20T16:37:21Z) - Design Optimization of NOMA Aided Multi-STAR-RIS for Indoor Environments: A Convex Approximation Imitated Reinforcement Learning Approach [51.63921041249406]
Non-orthogonal multiple access (NOMA) enables multiple users to share the same frequency band, and simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS)
deploying STAR-RIS indoors presents challenges in interference mitigation, power consumption, and real-time configuration.
A novel network architecture utilizing multiple access points (APs), STAR-RISs, and NOMA is proposed for indoor communication.
arXiv Detail & Related papers (2024-06-19T07:17:04Z) - Large Multimodal Agents: A Survey [78.81459893884737]
Large language models (LLMs) have achieved superior performance in powering text-based AI agents.
There is an emerging research trend focused on extending these LLM-powered AI agents into the multimodal domain.
This review aims to provide valuable insights and guidelines for future research in this rapidly evolving field.
arXiv Detail & Related papers (2024-02-23T06:04:23Z) - Learning from Multiple Independent Advisors in Multi-agent Reinforcement
Learning [15.195932300563541]
This paper considers the problem of simultaneously learning from multiple independent advisors in multi-agent reinforcement learning.
We provide principled algorithms that incorporate a set of advisors by both evaluating the advisors at each state and subsequently using the advisors to guide action selection.
arXiv Detail & Related papers (2023-01-26T15:00:23Z) - Revisiting Some Common Practices in Cooperative Multi-Agent
Reinforcement Learning [11.91425153754564]
We show that in environments with a highly multi-modal reward landscape, value decomposition, and parameter sharing can be problematic and lead to undesired outcomes.
In contrast, policy gradient (PG) methods with individual policies provably converge to an optimal solution in these cases.
We present practical suggestions on implementing multi-agent PG algorithms for either high rewards or diverse emergent behaviors.
arXiv Detail & Related papers (2022-06-15T13:03:05Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - Transferring Domain Knowledge with an Adviser in Continuous Tasks [0.0]
Reinforcement learning techniques are incapable of explicitly incorporating domain-specific knowledge into the learning process.
We adapt the Deep Deterministic Policy Gradient (DDPG) algorithm to incorporate an adviser.
Our experiments on OpenAi Gym benchmark tasks show that integrating domain knowledge through advisers expedites the learning and improves the policy towards better optima.
arXiv Detail & Related papers (2021-02-16T09:03:33Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.