Behaviour-conditioned policies for cooperative reinforcement learning
tasks
- URL: http://arxiv.org/abs/2110.01266v1
- Date: Mon, 4 Oct 2021 09:16:41 GMT
- Title: Behaviour-conditioned policies for cooperative reinforcement learning
tasks
- Authors: Antti Keurulainen (1 and 3), Isak Westerlund (3), Ariel Kwiatkowski
(3), Samuel Kaski (1 and 2) and Alexander Ilin (1) ((1) Helsinki Institute
for Information Technology HIIT, Department of Computer Science, Aalto
University, (2) Department of Computer Science, University of Manchester, (3)
Bitville Oy, Espoo, Finland)
- Abstract summary: In various real-world tasks, an agent needs to cooperate with unknown partner agent types.
Deep reinforcement learning models can be trained to deliver the required functionality but are known to suffer from sample inefficiency and slow learning.
We suggest a method, where we synthetically produce populations of agents with different behavioural patterns together with ground truth data of their behaviour.
We additionally suggest an agent architecture, which can efficiently use the generated data and gain the meta-learning capability.
- Score: 41.74498230885008
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The cooperation among AI systems, and between AI systems and humans is
becoming increasingly important. In various real-world tasks, an agent needs to
cooperate with unknown partner agent types. This requires the agent to assess
the behaviour of the partner agent during a cooperative task and to adjust its
own policy to support the cooperation. Deep reinforcement learning models can
be trained to deliver the required functionality but are known to suffer from
sample inefficiency and slow learning. However, adapting to a partner agent
behaviour during the ongoing task requires ability to assess the partner agent
type quickly. We suggest a method, where we synthetically produce populations
of agents with different behavioural patterns together with ground truth data
of their behaviour, and use this data for training a meta-learner. We
additionally suggest an agent architecture, which can efficiently use the
generated data and gain the meta-learning capability. When an agent is equipped
with such a meta-learner, it is capable of quickly adapting to cooperation with
unknown partner agent types in new situations. This method can be used to
automatically form a task distribution for meta-training from emerging
behaviours that arise, for example, through self-play.
Related papers
- Multi-agent cooperation through learning-aware policy gradients [53.63948041506278]
Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning.
We present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning.
We derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.
arXiv Detail & Related papers (2024-10-24T10:48:42Z) - Decentralized and Lifelong-Adaptive Multi-Agent Collaborative Learning [57.652899266553035]
Decentralized and lifelong-adaptive multi-agent collaborative learning aims to enhance collaboration among multiple agents without a central server.
We propose DeLAMA, a decentralized multi-agent lifelong collaborative learning algorithm with dynamic collaboration graphs.
arXiv Detail & Related papers (2024-03-11T09:21:11Z) - Contrastive learning-based agent modeling for deep reinforcement
learning [31.293496061727932]
Agent modeling is essential when designing adaptive policies for intelligent machine agents in multiagent systems.
We devised a Contrastive Learning-based Agent Modeling (CLAM) method that relies only on the local observations from the ego agent during training and execution.
CLAM is capable of generating consistent high-quality policy representations in real-time right from the beginning of each episode.
arXiv Detail & Related papers (2023-12-30T03:44:12Z) - ProAgent: Building Proactive Cooperative Agents with Large Language
Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents.
ProAgent can analyze the present state, and infer the intentions of teammates from observations.
ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z) - Conditional Imitation Learning for Multi-Agent Games [89.897635970366]
We study the problem of conditional multi-agent imitation learning, where we have access to joint trajectory demonstrations at training time.
We propose a novel approach to address the difficulties of scalability and data scarcity.
Our model learns a low-rank subspace over ego and partner agent strategies, then infers and adapts to a new partner strategy by interpolating in the subspace.
arXiv Detail & Related papers (2022-01-05T04:40:13Z) - Learning to Cooperate with Unseen Agent via Meta-Reinforcement Learning [4.060731229044571]
Ad hoc teamwork problem describes situations where an agent has to cooperate with previously unseen agents to achieve a common goal.
One could implement cooperative skills into an agent by using domain knowledge to design the agent's behavior.
We apply meta-reinforcement learning (meta-RL) formulation in the context of the ad hoc teamwork problem.
arXiv Detail & Related papers (2021-11-05T12:01:28Z) - Targeted Data Acquisition for Evolving Negotiation Agents [6.953246373478702]
Successful negotiators must learn how to balance optimizing for self-interest and cooperation.
Current artificial negotiation agents often heavily depend on the quality of the static datasets they were trained on.
We introduce a targeted data acquisition framework where we guide the exploration of a reinforcement learning agent.
arXiv Detail & Related papers (2021-06-14T19:45:59Z) - Meta-Reinforcement Learning Robust to Distributional Shift via Model
Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time.
Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.