Collaborating with Humans without Human Data
- URL: http://arxiv.org/abs/2110.08176v1
- Date: Fri, 15 Oct 2021 16:03:57 GMT
- Title: Collaborating with Humans without Human Data
- Authors: DJ Strouse, Kevin R. McKee, Matt Botvinick, Edward Hughes, Richard
Everett
- Abstract summary: We study the problem of how to train agents that collaborate well with human partners without using human data.
We train our agent partner as the best response to a population of self-play agents and their past checkpoints.
We find that Fictitious Co-Play (FCP) agents score significantly higher than SP, PP, and BCP when paired with novel agent and human partners.
- Score: 6.158826414652401
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Collaborating with humans requires rapidly adapting to their individual
strengths, weaknesses, and preferences. Unfortunately, most standard
multi-agent reinforcement learning techniques, such as self-play (SP) or
population play (PP), produce agents that overfit to their training partners
and do not generalize well to humans. Alternatively, researchers can collect
human data, train a human model using behavioral cloning, and then use that
model to train "human-aware" agents ("behavioral cloning play", or BCP). While
such an approach can improve the generalization of agents to new human
co-players, it involves the onerous and expensive step of collecting large
amounts of human data first. Here, we study the problem of how to train agents
that collaborate well with human partners without using human data. We argue
that the crux of the problem is to produce a diverse set of training partners.
Drawing inspiration from successful multi-agent approaches in competitive
domains, we find that a surprisingly simple approach is highly effective. We
train our agent partner as the best response to a population of self-play
agents and their past checkpoints taken throughout training, a method we call
Fictitious Co-Play (FCP). Our experiments focus on a two-player collaborative
cooking simulator that has recently been proposed as a challenge problem for
coordination with humans. We find that FCP agents score significantly higher
than SP, PP, and BCP when paired with novel agent and human partners.
Furthermore, humans also report a strong subjective preference to partnering
with FCP agents over all baselines.
Related papers
- Learning to Cooperate with Humans using Generative Agents [40.605931138995714]
Training agents that can coordinate zero-shot with humans is a key mission in multi-agent reinforcement learning (MARL)
We show emphlearning a generative model of human partners can effectively address this issue.
By sampling from the latent space, we can use the generative model to produce different partners to train Cooperator agents.
arXiv Detail & Related papers (2024-11-21T08:36:17Z) - Incorporating Human Flexibility through Reward Preferences in Human-AI Teaming [14.250120245287109]
We develop a Human-AI PbRL Cooperation Game, where the RL agent queries the human-in-the-loop to elicit task objective and human's preferences on the joint team behavior.
Under this game formulation, we first introduce the notion of Human Flexibility to evaluate team performance based on if humans prefer to follow a fixed policy or adapt to the RL agent on the fly.
We highlight a special case along these two dimensions, which we call Specified Orchestration, where the human is least flexible and agent has complete access to human policy.
arXiv Detail & Related papers (2023-12-21T20:48:15Z) - ProAgent: Building Proactive Cooperative Agents with Large Language
Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents.
ProAgent can analyze the present state, and infer the intentions of teammates from observations.
ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z) - Learning to Influence Human Behavior with Offline Reinforcement Learning [70.7884839812069]
We focus on influence in settings where there is a need to capture human suboptimality.
Experiments online with humans is potentially unsafe, and creating a high-fidelity simulator of the environment is often impractical.
We show that offline reinforcement learning can learn to effectively influence suboptimal humans by extending and combining elements of observed human-human behavior.
arXiv Detail & Related papers (2023-03-03T23:41:55Z) - PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI
Coordination [52.991211077362586]
We propose a policy ensemble method to increase the diversity of partners in the population.
We then develop a context-aware method enabling the ego agent to analyze and identify the partner's potential policy primitives.
In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners.
arXiv Detail & Related papers (2023-01-16T12:14:58Z) - Human-AI Coordination via Human-Regularized Search and Learning [33.95649252941375]
We develop a three-step algorithm that achieve strong performance in coordinating with real humans in the Hanabi benchmark.
We first use a regularized search algorithm and behavioral cloning to produce a better human model that captures diverse skill levels.
We show that our method beats a vanilla best response to behavioral cloning baseline by having experts play repeatedly with the two agents.
arXiv Detail & Related papers (2022-10-11T03:46:12Z) - Maximum Entropy Population Based Training for Zero-Shot Human-AI
Coordination [21.800115245671737]
We consider the problem of training a Reinforcement Learning (RL) agent without using any human data to make it capable of collaborating with humans.
We derive a centralized population entropy objective to facilitate learning of a diverse population of agents, which is later used to train a robust agent to collaborate with unseen partners.
arXiv Detail & Related papers (2021-12-22T07:19:36Z) - Skill Preferences: Learning to Extract and Execute Robotic Skills from
Human Feedback [82.96694147237113]
We present Skill Preferences, an algorithm that learns a model over human preferences and uses it to extract human-aligned skills from offline data.
We show that SkiP enables a simulated kitchen robot to solve complex multi-step manipulation tasks.
arXiv Detail & Related papers (2021-08-11T18:04:08Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Adaptive Agent Architecture for Real-time Human-Agent Teaming [3.284216428330814]
It is critical that agents infer human intent and adapt their polices for smooth coordination.
Most literature in human-agent teaming builds agents referencing a learned human model.
We propose a novel adaptive agent architecture in human-model-free setting on a two-player cooperative game.
arXiv Detail & Related papers (2021-03-07T20:08:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.