Learning from Multiple Independent Advisors in Multi-agent Reinforcement
Learning
- URL: http://arxiv.org/abs/2301.11153v1
- Date: Thu, 26 Jan 2023 15:00:23 GMT
- Title: Learning from Multiple Independent Advisors in Multi-agent Reinforcement
Learning
- Authors: Sriram Ganapathi Subramanian, Matthew E. Taylor, Kate Larson and Mark
Crowley
- Abstract summary: This paper considers the problem of simultaneously learning from multiple independent advisors in multi-agent reinforcement learning.
We provide principled algorithms that incorporate a set of advisors by both evaluating the advisors at each state and subsequently using the advisors to guide action selection.
- Score: 15.195932300563541
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-agent reinforcement learning typically suffers from the problem of
sample inefficiency, where learning suitable policies involves the use of many
data samples. Learning from external demonstrators is a possible solution that
mitigates this problem. However, most prior approaches in this area assume the
presence of a single demonstrator. Leveraging multiple knowledge sources (i.e.,
advisors) with expertise in distinct aspects of the environment could
substantially speed up learning in complex environments. This paper considers
the problem of simultaneously learning from multiple independent advisors in
multi-agent reinforcement learning. The approach leverages a two-level
Q-learning architecture, and extends this framework from single-agent to
multi-agent settings. We provide principled algorithms that incorporate a set
of advisors by both evaluating the advisors at each state and subsequently
using the advisors to guide action selection. We also provide theoretical
convergence and sample complexity guarantees. Experimentally, we validate our
approach in three different test-beds and show that our algorithms give better
performances than baselines, can effectively integrate the combined expertise
of different advisors, and learn to ignore bad advice.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Two-stage Learning-to-Defer for Multi-Task Learning [0.0]
We introduce a Learning-to-Defer approach for multi-task learning that encompasses both classification and regression tasks.
Our two-stage approach utilizes a rejector that defers decisions to the most accurate agent among a pre-trained joint-regressor models and one or more external experts.
arXiv Detail & Related papers (2024-10-21T07:44:57Z) - MADDM: Multi-Advisor Dynamic Binary Decision-Making by Maximizing the
Utility [8.212621730577897]
We propose a novel strategy for optimally selecting a set of advisers in a sequential binary decision-making setting.
We assume no access to ground truth and no prior knowledge about the reliability of advisers.
arXiv Detail & Related papers (2023-05-15T14:13:47Z) - On the Complexity of Multi-Agent Decision Making: From Learning in Games
to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees.
We study this question in a general framework for interactive decision making with multiple agents.
We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z) - Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning.
Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z) - Investigation of Independent Reinforcement Learning Algorithms in
Multi-Agent Environments [0.9281671380673306]
We show that independent algorithms can perform on par with multi-agent algorithms in cooperative and competitive settings.
We also show that adding recurrence improves the learning of independent algorithms in cooperative partially observable environments.
arXiv Detail & Related papers (2021-11-01T17:14:38Z) - Multi-Agent Advisor Q-Learning [18.8931184962221]
We provide a principled framework for incorporating action recommendations from online sub-optimal advisors in multi-agent settings.
We present two novel Q-learning based algorithms: ADMIRAL - Decision Making (ADMIRAL-DM) and ADMIRAL - Advisor Evaluation (ADMIRAL-AE)
We analyze the algorithms theoretically and provide fixed-point guarantees regarding their learning in general-sum games.
arXiv Detail & Related papers (2021-10-26T00:21:15Z) - Q-Mixing Network for Multi-Agent Pathfinding in Partially Observable
Grid Environments [62.997667081978825]
We consider the problem of multi-agent navigation in partially observable grid environments.
We suggest utilizing the reinforcement learning approach when the agents, first, learn the policies that map observations to actions and then follow these policies to reach their goals.
arXiv Detail & Related papers (2021-08-13T09:44:47Z) - Learning with Instance Bundles for Reading Comprehension [61.823444215188296]
We introduce new supervision techniques that compare question-answer scores across multiple related instances.
Specifically, we normalize these scores across various neighborhoods of closely contrasting questions and/or answers.
We empirically demonstrate the effectiveness of training with instance bundles on two datasets.
arXiv Detail & Related papers (2021-04-18T06:17:54Z) - Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning [11.292086312664383]
Our proposed algorithm, called Shared Experience Actor-Critic (SEAC), applies experience sharing in an actor-critic framework.
We evaluate SEAC in a collection of sparse-reward multi-agent environments and find that it consistently outperforms two baselines and two state-of-the-art algorithms.
arXiv Detail & Related papers (2020-06-12T13:24:50Z) - Provable Representation Learning for Imitation Learning via Bi-level
Optimization [60.059520774789654]
A common strategy in modern learning systems is to learn a representation that is useful for many tasks.
We study this strategy in the imitation learning setting for Markov decision processes (MDPs) where multiple experts' trajectories are available.
We instantiate this framework for the imitation learning settings of behavior cloning and observation-alone.
arXiv Detail & Related papers (2020-02-24T21:03:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.