Benchmarking Multi-Agent Preference-based Reinforcement Learning for
Human-AI Teaming
- URL: http://arxiv.org/abs/2312.14292v1
- Date: Thu, 21 Dec 2023 20:48:15 GMT
- Title: Benchmarking Multi-Agent Preference-based Reinforcement Learning for
Human-AI Teaming
- Authors: Siddhant Bhambri, Mudit Verma, Anil Murthy, Subbarao Kambhampati
- Abstract summary: Preference-based Reinforcement Learning (PbRL) is an active area of research, and has made significant strides in single-agent actor and in observer human-in-the-loop scenarios.
We consider a two-agent (Human-AI) cooperative setup where both the agents are rewarded according to human's reward function for the team.
However, the agent does not have access to it, and instead, utilizes preference-based queries to elicit its objectives and human's preferences for the robot in the human-robot team.
- Score: 16.701242561345786
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Preference-based Reinforcement Learning (PbRL) is an active area of research,
and has made significant strides in single-agent actor and in observer
human-in-the-loop scenarios. However, its application within the co-operative
multi-agent RL frameworks, where humans actively participate and express
preferences for agent behavior, remains largely uncharted. We consider a
two-agent (Human-AI) cooperative setup where both the agents are rewarded
according to human's reward function for the team. However, the agent does not
have access to it, and instead, utilizes preference-based queries to elicit its
objectives and human's preferences for the robot in the human-robot team. We
introduce the notion of Human-Flexibility, i.e. whether the human partner is
amenable to multiple team strategies, with a special case being Specified
Orchestration where the human has a single team policy in mind (most
constrained case). We propose a suite of domains to study PbRL for Human-AI
cooperative setup which explicitly require forced cooperation. Adapting
state-of-the-art single-agent PbRL algorithms to our two-agent setting, we
conduct a comprehensive benchmarking study across our domain suite. Our
findings highlight the challenges associated with high degree of
Human-Flexibility and the limited access to the human's envisioned policy in
PbRL for Human-AI cooperation. Notably, we observe that PbRL algorithms exhibit
effective performance exclusively in the case of Specified Orchestration which
can be seen as an upper bound PbRL performance for future research.
Related papers
- Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration [51.452664740963066]
Collaborative Gym is a framework enabling asynchronous, tripartite interaction among agents, humans, and task environments.
We instantiate Co-Gym with three representative tasks in both simulated and real-world conditions.
Our findings reveal that collaborative agents consistently outperform their fully autonomous counterparts in task performance.
arXiv Detail & Related papers (2024-12-20T09:21:15Z) - Large Language Model-based Human-Agent Collaboration for Complex Task
Solving [94.3914058341565]
We introduce the problem of Large Language Models (LLMs)-based human-agent collaboration for complex task-solving.
We propose a Reinforcement Learning-based Human-Agent Collaboration method, ReHAC.
This approach includes a policy model designed to determine the most opportune stages for human intervention within the task-solving process.
arXiv Detail & Related papers (2024-02-20T11:03:36Z) - ProAgent: Building Proactive Cooperative Agents with Large Language
Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents.
ProAgent can analyze the present state, and infer the intentions of teammates from observations.
ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z) - A Hierarchical Approach to Population Training for Human-AI
Collaboration [20.860808795671343]
We introduce a Hierarchical Reinforcement Learning (HRL) based method for Human-AI Collaboration.
We demonstrate that our method is able to dynamically adapt to novel partners of different play styles and skill levels in the 2-player collaborative Overcooked game environment.
arXiv Detail & Related papers (2023-05-26T07:53:12Z) - PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI
Coordination [52.991211077362586]
We propose a policy ensemble method to increase the diversity of partners in the population.
We then develop a context-aware method enabling the ego agent to analyze and identify the partner's potential policy primitives.
In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners.
arXiv Detail & Related papers (2023-01-16T12:14:58Z) - Human-AI Coordination via Human-Regularized Search and Learning [33.95649252941375]
We develop a three-step algorithm that achieve strong performance in coordinating with real humans in the Hanabi benchmark.
We first use a regularized search algorithm and behavioral cloning to produce a better human model that captures diverse skill levels.
We show that our method beats a vanilla best response to behavioral cloning baseline by having experts play repeatedly with the two agents.
arXiv Detail & Related papers (2022-10-11T03:46:12Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Maximum Entropy Population Based Training for Zero-Shot Human-AI
Coordination [21.800115245671737]
We consider the problem of training a Reinforcement Learning (RL) agent without using any human data to make it capable of collaborating with humans.
We derive a centralized population entropy objective to facilitate learning of a diverse population of agents, which is later used to train a robust agent to collaborate with unseen partners.
arXiv Detail & Related papers (2021-12-22T07:19:36Z) - Collaborating with Humans without Human Data [6.158826414652401]
We study the problem of how to train agents that collaborate well with human partners without using human data.
We train our agent partner as the best response to a population of self-play agents and their past checkpoints.
We find that Fictitious Co-Play (FCP) agents score significantly higher than SP, PP, and BCP when paired with novel agent and human partners.
arXiv Detail & Related papers (2021-10-15T16:03:57Z) - Adaptive Agent Architecture for Real-time Human-Agent Teaming [3.284216428330814]
It is critical that agents infer human intent and adapt their polices for smooth coordination.
Most literature in human-agent teaming builds agents referencing a learned human model.
We propose a novel adaptive agent architecture in human-model-free setting on a two-player cooperative game.
arXiv Detail & Related papers (2021-03-07T20:08:09Z) - Distributed Reinforcement Learning for Cooperative Multi-Robot Object
Manipulation [53.262360083572005]
We consider solving a cooperative multi-robot object manipulation task using reinforcement learning (RL)
We propose two distributed multi-agent RL approaches: distributed approximate RL (DA-RL) and game-theoretic RL (GT-RL)
Although we focus on a small system of two agents in this paper, both DA-RL and GT-RL apply to general multi-agent systems, and are expected to scale well to large systems.
arXiv Detail & Related papers (2020-03-21T00:43:54Z) - Scalable Multi-Agent Inverse Reinforcement Learning via
Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems.
We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.