Human-AI Coordination via Human-Regularized Search and Learning
- URL: http://arxiv.org/abs/2210.05125v1
- Date: Tue, 11 Oct 2022 03:46:12 GMT
- Title: Human-AI Coordination via Human-Regularized Search and Learning
- Authors: Hengyuan Hu, David J Wu, Adam Lerer, Jakob Foerster, Noam Brown
- Abstract summary: We develop a three-step algorithm that achieve strong performance in coordinating with real humans in the Hanabi benchmark.
We first use a regularized search algorithm and behavioral cloning to produce a better human model that captures diverse skill levels.
We show that our method beats a vanilla best response to behavioral cloning baseline by having experts play repeatedly with the two agents.
- Score: 33.95649252941375
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of making AI agents that collaborate well with humans
in partially observable fully cooperative environments given datasets of human
behavior. Inspired by piKL, a human-data-regularized search method that
improves upon a behavioral cloning policy without diverging far away from it,
we develop a three-step algorithm that achieve strong performance in
coordinating with real humans in the Hanabi benchmark. We first use a
regularized search algorithm and behavioral cloning to produce a better human
model that captures diverse skill levels. Then, we integrate the policy
regularization idea into reinforcement learning to train a human-like best
response to the human model. Finally, we apply regularized search on top of the
best response policy at test time to handle out-of-distribution challenges when
playing with humans. We evaluate our method in two large scale experiments with
humans. First, we show that our method outperforms experts when playing with a
group of diverse human players in ad-hoc teams. Second, we show that our method
beats a vanilla best response to behavioral cloning baseline by having experts
play repeatedly with the two agents.
Related papers
- Learning to Cooperate with Humans using Generative Agents [40.605931138995714]
Training agents that can coordinate zero-shot with humans is a key mission in multi-agent reinforcement learning (MARL)
We show emphlearning a generative model of human partners can effectively address this issue.
By sampling from the latent space, we can use the generative model to produce different partners to train Cooperator agents.
arXiv Detail & Related papers (2024-11-21T08:36:17Z) - Incorporating Human Flexibility through Reward Preferences in Human-AI Teaming [14.250120245287109]
We develop a Human-AI PbRL Cooperation Game, where the RL agent queries the human-in-the-loop to elicit task objective and human's preferences on the joint team behavior.
Under this game formulation, we first introduce the notion of Human Flexibility to evaluate team performance based on if humans prefer to follow a fixed policy or adapt to the RL agent on the fly.
We highlight a special case along these two dimensions, which we call Specified Orchestration, where the human is least flexible and agent has complete access to human policy.
arXiv Detail & Related papers (2023-12-21T20:48:15Z) - Language Instructed Reinforcement Learning for Human-AI Coordination [23.694362407434753]
We propose a novel framework, instructRL, that enables humans to specify what kind of strategies they expect from their AI partners through natural language instructions.
We show that instructRL converges to human-like policies that satisfy the given instructions in a proof-of-concept environment and the challenging Hanabi benchmark.
arXiv Detail & Related papers (2023-04-13T04:47:31Z) - BO-Muse: A human expert and AI teaming framework for accelerated
experimental design [58.61002520273518]
Our algorithm lets the human expert take the lead in the experimental process.
We show that our algorithm converges sub-linearly, at a rate faster than the AI or human alone.
arXiv Detail & Related papers (2023-03-03T02:56:05Z) - PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI
Coordination [52.991211077362586]
We propose a policy ensemble method to increase the diversity of partners in the population.
We then develop a context-aware method enabling the ego agent to analyze and identify the partner's potential policy primitives.
In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners.
arXiv Detail & Related papers (2023-01-16T12:14:58Z) - Mastering the Game of No-Press Diplomacy via Human-Regularized
Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition.
We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy.
We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z) - Maximum Entropy Population Based Training for Zero-Shot Human-AI
Coordination [21.800115245671737]
We consider the problem of training a Reinforcement Learning (RL) agent without using any human data to make it capable of collaborating with humans.
We derive a centralized population entropy objective to facilitate learning of a diverse population of agents, which is later used to train a robust agent to collaborate with unseen partners.
arXiv Detail & Related papers (2021-12-22T07:19:36Z) - Modeling Strong and Human-Like Gameplay with KL-Regularized Search [64.24339197581769]
We consider the task of building strong but human-like policies in multi-agent decision-making problems.
Imitation learning is effective at predicting human actions but may not match the strength of expert humans.
We show in chess and Go that regularizing search policies based on the KL divergence from an imitation-learned policy by applying Monte Carlo tree search produces policies that have higher human prediction accuracy and are stronger than the imitation policy.
arXiv Detail & Related papers (2021-12-14T16:52:49Z) - Collaborating with Humans without Human Data [6.158826414652401]
We study the problem of how to train agents that collaborate well with human partners without using human data.
We train our agent partner as the best response to a population of self-play agents and their past checkpoints.
We find that Fictitious Co-Play (FCP) agents score significantly higher than SP, PP, and BCP when paired with novel agent and human partners.
arXiv Detail & Related papers (2021-10-15T16:03:57Z) - Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration [51.268988527778276]
We present a method for learning a human-robot collaboration policy from human-human collaboration demonstrations.
Our method co-optimizes a human policy and a robot policy in an interactive learning process.
arXiv Detail & Related papers (2021-08-13T03:14:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.