Human-AI Coordination via Human-Regularized Search and Learning
- URL: http://arxiv.org/abs/2210.05125v1
- Date: Tue, 11 Oct 2022 03:46:12 GMT
- Title: Human-AI Coordination via Human-Regularized Search and Learning
- Authors: Hengyuan Hu, David J Wu, Adam Lerer, Jakob Foerster, Noam Brown
- Abstract summary: We develop a three-step algorithm that achieve strong performance in coordinating with real humans in the Hanabi benchmark.
We first use a regularized search algorithm and behavioral cloning to produce a better human model that captures diverse skill levels.
We show that our method beats a vanilla best response to behavioral cloning baseline by having experts play repeatedly with the two agents.
- Score: 33.95649252941375
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the problem of making AI agents that collaborate well with humans
in partially observable fully cooperative environments given datasets of human
behavior. Inspired by piKL, a human-data-regularized search method that
improves upon a behavioral cloning policy without diverging far away from it,
we develop a three-step algorithm that achieve strong performance in
coordinating with real humans in the Hanabi benchmark. We first use a
regularized search algorithm and behavioral cloning to produce a better human
model that captures diverse skill levels. Then, we integrate the policy
regularization idea into reinforcement learning to train a human-like best
response to the human model. Finally, we apply regularized search on top of the
best response policy at test time to handle out-of-distribution challenges when
playing with humans. We evaluate our method in two large scale experiments with
humans. First, we show that our method outperforms experts when playing with a
group of diverse human players in ad-hoc teams. Second, we show that our method
beats a vanilla best response to behavioral cloning baseline by having experts
play repeatedly with the two agents.
Related papers
- Language Instructed Reinforcement Learning for Human-AI Coordination [23.694362407434753]
We propose a novel framework, instructRL, that enables humans to specify what kind of strategies they expect from their AI partners through natural language instructions.
We show that instructRL converges to human-like policies that satisfy the given instructions in a proof-of-concept environment and the challenging Hanabi benchmark.
arXiv Detail & Related papers (2023-04-13T04:47:31Z) - BO-Muse: A human expert and AI teaming framework for accelerated
experimental design [58.61002520273518]
Our algorithm lets the human expert take the lead in the experimental process.
We show that our algorithm converges sub-linearly, at a rate faster than the AI or human alone.
arXiv Detail & Related papers (2023-03-03T02:56:05Z) - PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI
Coordination [52.991211077362586]
We propose a policy ensemble method to increase the diversity of partners in the population.
We then develop a context-aware method enabling the ego agent to analyze and identify the partner's potential policy primitives.
In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners.
arXiv Detail & Related papers (2023-01-16T12:14:58Z) - Mastering the Game of No-Press Diplomacy via Human-Regularized
Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition.
We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy.
We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z) - Maximum Entropy Population Based Training for Zero-Shot Human-AI
Coordination [21.800115245671737]
We consider the problem of training a Reinforcement Learning (RL) agent without using any human data to make it capable of collaborating with humans.
We derive a centralized population entropy objective to facilitate learning of a diverse population of agents, which is later used to train a robust agent to collaborate with unseen partners.
arXiv Detail & Related papers (2021-12-22T07:19:36Z) - Modeling Strong and Human-Like Gameplay with KL-Regularized Search [64.24339197581769]
We consider the task of building strong but human-like policies in multi-agent decision-making problems.
Imitation learning is effective at predicting human actions but may not match the strength of expert humans.
We show in chess and Go that regularizing search policies based on the KL divergence from an imitation-learned policy by applying Monte Carlo tree search produces policies that have higher human prediction accuracy and are stronger than the imitation policy.
arXiv Detail & Related papers (2021-12-14T16:52:49Z) - Collaborating with Humans without Human Data [6.158826414652401]
We study the problem of how to train agents that collaborate well with human partners without using human data.
We train our agent partner as the best response to a population of self-play agents and their past checkpoints.
We find that Fictitious Co-Play (FCP) agents score significantly higher than SP, PP, and BCP when paired with novel agent and human partners.
arXiv Detail & Related papers (2021-10-15T16:03:57Z) - Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration [51.268988527778276]
We present a method for learning a human-robot collaboration policy from human-human collaboration demonstrations.
Our method co-optimizes a human policy and a robot policy in an interactive learning process.
arXiv Detail & Related papers (2021-08-13T03:14:43Z) - Adaptive Agent Architecture for Real-time Human-Agent Teaming [3.284216428330814]
It is critical that agents infer human intent and adapt their polices for smooth coordination.
Most literature in human-agent teaming builds agents referencing a learned human model.
We propose a novel adaptive agent architecture in human-model-free setting on a two-player cooperative game.
arXiv Detail & Related papers (2021-03-07T20:08:09Z) - Learning Human Rewards by Inferring Their Latent Intelligence Levels in
Multi-Agent Games: A Theory-of-Mind Approach with Application to Driving Data [18.750834997334664]
We argue that humans are bounded rational and have different intelligence levels when reasoning about others' decision-making process.
We propose a new multi-agent Inverse Reinforcement Learning framework that reasons about humans' latent intelligence levels during learning.
arXiv Detail & Related papers (2021-03-07T07:48:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.