Benchmarking Multi-Agent Preference-based Reinforcement Learning for
Human-AI Teaming
- URL: http://arxiv.org/abs/2312.14292v1
- Date: Thu, 21 Dec 2023 20:48:15 GMT
- Title: Benchmarking Multi-Agent Preference-based Reinforcement Learning for
Human-AI Teaming
- Authors: Siddhant Bhambri, Mudit Verma, Anil Murthy, Subbarao Kambhampati
- Abstract summary: Preference-based Reinforcement Learning (PbRL) is an active area of research, and has made significant strides in single-agent actor and in observer human-in-the-loop scenarios.
We consider a two-agent (Human-AI) cooperative setup where both the agents are rewarded according to human's reward function for the team.
However, the agent does not have access to it, and instead, utilizes preference-based queries to elicit its objectives and human's preferences for the robot in the human-robot team.
- Score: 16.701242561345786
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Preference-based Reinforcement Learning (PbRL) is an active area of research,
and has made significant strides in single-agent actor and in observer
human-in-the-loop scenarios. However, its application within the co-operative
multi-agent RL frameworks, where humans actively participate and express
preferences for agent behavior, remains largely uncharted. We consider a
two-agent (Human-AI) cooperative setup where both the agents are rewarded
according to human's reward function for the team. However, the agent does not
have access to it, and instead, utilizes preference-based queries to elicit its
objectives and human's preferences for the robot in the human-robot team. We
introduce the notion of Human-Flexibility, i.e. whether the human partner is
amenable to multiple team strategies, with a special case being Specified
Orchestration where the human has a single team policy in mind (most
constrained case). We propose a suite of domains to study PbRL for Human-AI
cooperative setup which explicitly require forced cooperation. Adapting
state-of-the-art single-agent PbRL algorithms to our two-agent setting, we
conduct a comprehensive benchmarking study across our domain suite. Our
findings highlight the challenges associated with high degree of
Human-Flexibility and the limited access to the human's envisioned policy in
PbRL for Human-AI cooperation. Notably, we observe that PbRL algorithms exhibit
effective performance exclusively in the case of Specified Orchestration which
can be seen as an upper bound PbRL performance for future research.
Related papers
- Mixed-Initiative Human-Robot Teaming under Suboptimality with Online Bayesian Adaptation [0.6591036379613505]
We develop computational modeling and optimization techniques for enhancing the performance of suboptimal human-agent teams.
We adopt an online Bayesian approach that enables a robot to infer people's willingness to comply with its assistance in a sequential decision-making game.
Our user studies show that user preferences and team performance indeed vary with robot intervention styles.
arXiv Detail & Related papers (2024-03-24T14:38:18Z) - Large Language Model-based Human-Agent Collaboration for Complex Task
Solving [94.3914058341565]
We introduce the problem of Large Language Models (LLMs)-based human-agent collaboration for complex task-solving.
We propose a Reinforcement Learning-based Human-Agent Collaboration method, ReHAC.
This approach includes a policy model designed to determine the most opportune stages for human intervention within the task-solving process.
arXiv Detail & Related papers (2024-02-20T11:03:36Z) - ProAgent: Building Proactive Cooperative Agents with Large Language
Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents.
ProAgent can analyze the present state, and infer the intentions of teammates from observations.
ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z) - Secrets of RLHF in Large Language Models Part I: PPO [81.01936993929127]
Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence.
reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit.
In this report, we dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training.
arXiv Detail & Related papers (2023-07-11T01:55:24Z) - A Hierarchical Approach to Population Training for Human-AI
Collaboration [20.860808795671343]
We introduce a Hierarchical Reinforcement Learning (HRL) based method for Human-AI Collaboration.
We demonstrate that our method is able to dynamically adapt to novel partners of different play styles and skill levels in the 2-player collaborative Overcooked game environment.
arXiv Detail & Related papers (2023-05-26T07:53:12Z) - PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI
Coordination [52.991211077362586]
We propose a policy ensemble method to increase the diversity of partners in the population.
We then develop a context-aware method enabling the ego agent to analyze and identify the partner's potential policy primitives.
In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners.
arXiv Detail & Related papers (2023-01-16T12:14:58Z) - Multi-agent Deep Covering Skill Discovery [50.812414209206054]
We propose Multi-agent Deep Covering Option Discovery, which constructs the multi-agent options through minimizing the expected cover time of the multiple agents' joint state space.
Also, we propose a novel framework to adopt the multi-agent options in the MARL process.
We show that the proposed algorithm can effectively capture the agent interactions with the attention mechanism, successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options.
arXiv Detail & Related papers (2022-10-07T00:40:59Z) - Warmth and competence in human-agent cooperation [0.7237068561453082]
Recent studies demonstrate that AI agents trained with deep reinforcement learning are capable of collaborating with humans.
We train deep reinforcement learning agents in Coins, a two-player social dilemma.
Participants' perceptions of warmth and competence predict their stated preferences for different agents.
arXiv Detail & Related papers (2022-01-31T18:57:08Z) - Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams [14.215359943041369]
We propose and analyze a decentralized Multi-Armed Bandit (MAB) problem with coupled rewards as an abstraction of more general multi-agent collaboration.
We propose a Partner-Aware strategy for joint sequential decision-making that extends the well-known single-agent Upper Confidence Bound algorithm.
Our results show that the proposed partner-aware strategy outperforms other known methods, and our human subject studies suggest humans prefer to collaborate with AI agents implementing our partner-aware strategy.
arXiv Detail & Related papers (2021-10-02T08:17:30Z) - Adaptive Agent Architecture for Real-time Human-Agent Teaming [3.284216428330814]
It is critical that agents infer human intent and adapt their polices for smooth coordination.
Most literature in human-agent teaming builds agents referencing a learned human model.
We propose a novel adaptive agent architecture in human-model-free setting on a two-player cooperative game.
arXiv Detail & Related papers (2021-03-07T20:08:09Z) - Scalable Multi-Agent Inverse Reinforcement Learning via
Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems.
We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.