A Hierarchical Approach to Population Training for Human-AI
Collaboration
- URL: http://arxiv.org/abs/2305.16708v1
- Date: Fri, 26 May 2023 07:53:12 GMT
- Title: A Hierarchical Approach to Population Training for Human-AI
Collaboration
- Authors: Yi Loo, Chen Gong and Malika Meghjani
- Abstract summary: We introduce a Hierarchical Reinforcement Learning (HRL) based method for Human-AI Collaboration.
We demonstrate that our method is able to dynamically adapt to novel partners of different play styles and skill levels in the 2-player collaborative Overcooked game environment.
- Score: 20.860808795671343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A major challenge for deep reinforcement learning (DRL) agents is to
collaborate with novel partners that were not encountered by them during the
training phase. This is specifically worsened by an increased variance in
action responses when the DRL agents collaborate with human partners due to the
lack of consistency in human behaviors. Recent work have shown that training a
single agent as the best response to a diverse population of training partners
significantly increases an agent's robustness to novel partners. We further
enhance the population-based training approach by introducing a Hierarchical
Reinforcement Learning (HRL) based method for Human-AI Collaboration. Our agent
is able to learn multiple best-response policies as its low-level policy while
at the same time, it learns a high-level policy that acts as a manager which
allows the agent to dynamically switch between the low-level best-response
policies based on its current partner. We demonstrate that our method is able
to dynamically adapt to novel partners of different play styles and skill
levels in the 2-player collaborative Overcooked game environment. We also
conducted a human study in the same environment to test the effectiveness of
our method when partnering with real human subjects.
Related papers
- Multi-agent cooperation through learning-aware policy gradients [53.63948041506278]
Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning.
We present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning.
We derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.
arXiv Detail & Related papers (2024-10-24T10:48:42Z) - Large Language Model-based Human-Agent Collaboration for Complex Task
Solving [94.3914058341565]
We introduce the problem of Large Language Models (LLMs)-based human-agent collaboration for complex task-solving.
We propose a Reinforcement Learning-based Human-Agent Collaboration method, ReHAC.
This approach includes a policy model designed to determine the most opportune stages for human intervention within the task-solving process.
arXiv Detail & Related papers (2024-02-20T11:03:36Z) - Incorporating Human Flexibility through Reward Preferences in Human-AI Teaming [14.250120245287109]
We develop a Human-AI PbRL Cooperation Game, where the RL agent queries the human-in-the-loop to elicit task objective and human's preferences on the joint team behavior.
Under this game formulation, we first introduce the notion of Human Flexibility to evaluate team performance based on if humans prefer to follow a fixed policy or adapt to the RL agent on the fly.
We highlight a special case along these two dimensions, which we call Specified Orchestration, where the human is least flexible and agent has complete access to human policy.
arXiv Detail & Related papers (2023-12-21T20:48:15Z) - ProAgent: Building Proactive Cooperative Agents with Large Language
Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents.
ProAgent can analyze the present state, and infer the intentions of teammates from observations.
ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z) - PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI
Coordination [52.991211077362586]
We propose a policy ensemble method to increase the diversity of partners in the population.
We then develop a context-aware method enabling the ego agent to analyze and identify the partner's potential policy primitives.
In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners.
arXiv Detail & Related papers (2023-01-16T12:14:58Z) - Hierarchical Reinforcement Learning with Opponent Modeling for
Distributed Multi-agent Cooperation [13.670618752160594]
Deep reinforcement learning (DRL) provides a promising approach for multi-agent cooperation through the interaction of the agents and environments.
Traditional DRL solutions suffer from the high dimensions of multiple agents with continuous action space during policy search.
We propose a hierarchical reinforcement learning approach with high-level decision-making and low-level individual control for efficient policy search.
arXiv Detail & Related papers (2022-06-25T19:09:29Z) - Conditional Imitation Learning for Multi-Agent Games [89.897635970366]
We study the problem of conditional multi-agent imitation learning, where we have access to joint trajectory demonstrations at training time.
We propose a novel approach to address the difficulties of scalability and data scarcity.
Our model learns a low-rank subspace over ego and partner agent strategies, then infers and adapts to a new partner strategy by interpolating in the subspace.
arXiv Detail & Related papers (2022-01-05T04:40:13Z) - Behaviour-conditioned policies for cooperative reinforcement learning
tasks [41.74498230885008]
In various real-world tasks, an agent needs to cooperate with unknown partner agent types.
Deep reinforcement learning models can be trained to deliver the required functionality but are known to suffer from sample inefficiency and slow learning.
We suggest a method, where we synthetically produce populations of agents with different behavioural patterns together with ground truth data of their behaviour.
We additionally suggest an agent architecture, which can efficiently use the generated data and gain the meta-learning capability.
arXiv Detail & Related papers (2021-10-04T09:16:41Z) - On the Critical Role of Conventions in Adaptive Human-AI Collaboration [73.21967490610142]
We propose a learning framework that teases apart rule-dependent representation from convention-dependent representation.
We experimentally validate our approach on three collaborative tasks varying in complexity.
arXiv Detail & Related papers (2021-04-07T02:46:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.