Maximum Entropy Population Based Training for Zero-Shot Human-AI
Coordination
- URL: http://arxiv.org/abs/2112.11701v1
- Date: Wed, 22 Dec 2021 07:19:36 GMT
- Title: Maximum Entropy Population Based Training for Zero-Shot Human-AI
Coordination
- Authors: Rui Zhao, Jinming Song, Hu Haifeng, Yang Gao, Yi Wu, Zhongqian Sun,
Yang Wei
- Abstract summary: We consider the problem of training a Reinforcement Learning (RL) agent without using any human data to make it capable of collaborating with humans.
We derive a centralized population entropy objective to facilitate learning of a diverse population of agents, which is later used to train a robust agent to collaborate with unseen partners.
- Score: 21.800115245671737
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An AI agent should be able to coordinate with humans to solve tasks. We
consider the problem of training a Reinforcement Learning (RL) agent without
using any human data, i.e., in a zero-shot setting, to make it capable of
collaborating with humans. Standard RL agents learn through self-play.
Unfortunately, these agents only know how to collaborate with themselves and
normally do not perform well with unseen partners, such as humans. The
methodology of how to train a robust agent in a zero-shot fashion is still
subject to research. Motivated from the maximum entropy RL, we derive a
centralized population entropy objective to facilitate learning of a diverse
population of agents, which is later used to train a robust agent to
collaborate with unseen partners. The proposed method shows its effectiveness
compared to baseline methods, including self-play PPO, the standard
Population-Based Training (PBT), and trajectory diversity-based PBT, in the
popular Overcooked game environment. We also conduct online experiments with
real humans and further demonstrate the efficacy of the method in the real
world. A supplementary video showing experimental results is available at
https://youtu.be/Xh-FKD0AAKE.
Related papers
- Human-compatible driving partners through data-regularized self-play reinforcement learning [3.9682126792844583]
Human-Regularized PPO (HR-PPO) is a multi-agent algorithm where agents are trained through self-play with a small penalty for deviating from a human reference policy.
Results show our HR-PPO agents are highly effective in achieving goals, with a success rate of 93%, an off-road rate of 3.5%, and a collision rate of 3%.
arXiv Detail & Related papers (2024-03-28T17:56:56Z) - Large Language Model-based Human-Agent Collaboration for Complex Task
Solving [94.3914058341565]
We introduce the problem of Large Language Models (LLMs)-based human-agent collaboration for complex task-solving.
We propose a Reinforcement Learning-based Human-Agent Collaboration method, ReHAC.
This approach includes a policy model designed to determine the most opportune stages for human intervention within the task-solving process.
arXiv Detail & Related papers (2024-02-20T11:03:36Z) - ProAgent: Building Proactive Cooperative Agents with Large Language
Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents.
ProAgent can analyze the present state, and infer the intentions of teammates from observations.
ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z) - PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI
Coordination [52.991211077362586]
We propose a policy ensemble method to increase the diversity of partners in the population.
We then develop a context-aware method enabling the ego agent to analyze and identify the partner's potential policy primitives.
In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners.
arXiv Detail & Related papers (2023-01-16T12:14:58Z) - Improving Multimodal Interactive Agents with Reinforcement Learning from
Human Feedback [16.268581985382433]
An important goal in artificial intelligence is to create agents that can both interact naturally with humans and learn from their feedback.
Here we demonstrate how to use reinforcement learning from human feedback to improve upon simulated, embodied agents.
arXiv Detail & Related papers (2022-11-21T16:00:31Z) - Human-AI Coordination via Human-Regularized Search and Learning [33.95649252941375]
We develop a three-step algorithm that achieve strong performance in coordinating with real humans in the Hanabi benchmark.
We first use a regularized search algorithm and behavioral cloning to produce a better human model that captures diverse skill levels.
We show that our method beats a vanilla best response to behavioral cloning baseline by having experts play repeatedly with the two agents.
arXiv Detail & Related papers (2022-10-11T03:46:12Z) - Human-to-Robot Imitation in the Wild [50.49660984318492]
We propose an efficient one-shot robot learning algorithm, centered around learning from a third-person perspective.
We show one-shot generalization and success in real-world settings, including 20 different manipulation tasks in the wild.
arXiv Detail & Related papers (2022-07-19T17:59:59Z) - Collaborating with Humans without Human Data [6.158826414652401]
We study the problem of how to train agents that collaborate well with human partners without using human data.
We train our agent partner as the best response to a population of self-play agents and their past checkpoints.
We find that Fictitious Co-Play (FCP) agents score significantly higher than SP, PP, and BCP when paired with novel agent and human partners.
arXiv Detail & Related papers (2021-10-15T16:03:57Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Adaptive Agent Architecture for Real-time Human-Agent Teaming [3.284216428330814]
It is critical that agents infer human intent and adapt their polices for smooth coordination.
Most literature in human-agent teaming builds agents referencing a learned human model.
We propose a novel adaptive agent architecture in human-model-free setting on a two-player cooperative game.
arXiv Detail & Related papers (2021-03-07T20:08:09Z) - On the interaction between supervision and self-play in emergent
communication [82.290338507106]
We investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency.
We find that first training agents via supervised learning on human data followed by self-play outperforms the converse.
arXiv Detail & Related papers (2020-02-04T02:35:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.