Related papers: Behaviour-conditioned policies for cooperative reinforcement learning tasks

Behaviour-conditioned policies for cooperative reinforcement learning tasks

URL: http://arxiv.org/abs/2110.01266v1
Date: Mon, 4 Oct 2021 09:16:41 GMT
Title: Behaviour-conditioned policies for cooperative reinforcement learning tasks
Authors: Antti Keurulainen (1 and 3), Isak Westerlund (3), Ariel Kwiatkowski (3), Samuel Kaski (1 and 2) and Alexander Ilin (1) ((1) Helsinki Institute for Information Technology HIIT, Department of Computer Science, Aalto University, (2) Department of Computer Science, University of Manchester, (3) Bitville Oy, Espoo, Finland)
Abstract summary: In various real-world tasks, an agent needs to cooperate with unknown partner agent types. Deep reinforcement learning models can be trained to deliver the required functionality but are known to suffer from sample inefficiency and slow learning. We suggest a method, where we synthetically produce populations of agents with different behavioural patterns together with ground truth data of their behaviour. We additionally suggest an agent architecture, which can efficiently use the generated data and gain the meta-learning capability.
Score: 41.74498230885008
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The cooperation among AI systems, and between AI systems and humans is becoming increasingly important. In various real-world tasks, an agent needs to cooperate with unknown partner agent types. This requires the agent to assess the behaviour of the partner agent during a cooperative task and to adjust its own policy to support the cooperation. Deep reinforcement learning models can be trained to deliver the required functionality but are known to suffer from sample inefficiency and slow learning. However, adapting to a partner agent behaviour during the ongoing task requires ability to assess the partner agent type quickly. We suggest a method, where we synthetically produce populations of agents with different behavioural patterns together with ground truth data of their behaviour, and use this data for training a meta-learner. We additionally suggest an agent architecture, which can efficiently use the generated data and gain the meta-learning capability. When an agent is equipped with such a meta-learner, it is capable of quickly adapting to cooperation with unknown partner agent types in new situations. This method can be used to automatically form a task distribution for meta-training from emerging behaviours that arise, for example, through self-play.

Related papers

Improving Human-AI Coordination through Adversarial Training and Generative Models [36.54154192505703]
Generalizing to novel humans requires training on data that captures the diversity of human behaviors. Adversarial training is one avenue for searching for such data and ensuring that agents are robust. We propose a novel strategy for overcoming self-sabotage that combines a pre-trained generative model to simulate valid cooperative agent policies.
arXiv Detail & Related papers (2025-04-21T21:53:00Z)
Multi-agent cooperation through learning-aware policy gradients [53.63948041506278]
Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning. We present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning. We derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.
arXiv Detail & Related papers (2024-10-24T10:48:42Z)
Decentralized and Lifelong-Adaptive Multi-Agent Collaborative Learning [57.652899266553035]
Decentralized and lifelong-adaptive multi-agent collaborative learning aims to enhance collaboration among multiple agents without a central server. We propose DeLAMA, a decentralized multi-agent lifelong collaborative learning algorithm with dynamic collaboration graphs.
arXiv Detail & Related papers (2024-03-11T09:21:11Z)
Contrastive learning-based agent modeling for deep reinforcement learning [31.293496061727932]
Agent modeling is essential when designing adaptive policies for intelligent machine agents in multiagent systems. We devised a Contrastive Learning-based Agent Modeling (CLAM) method that relies only on the local observations from the ego agent during training and execution. CLAM is capable of generating consistent high-quality policy representations in real-time right from the beginning of each episode.
arXiv Detail & Related papers (2023-12-30T03:44:12Z)
ProAgent: Building Proactive Cooperative Agents with Large Language Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents. ProAgent can analyze the present state, and infer the intentions of teammates from observations. ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z)
Conditional Imitation Learning for Multi-Agent Games [89.897635970366]
We study the problem of conditional multi-agent imitation learning, where we have access to joint trajectory demonstrations at training time. We propose a novel approach to address the difficulties of scalability and data scarcity. Our model learns a low-rank subspace over ego and partner agent strategies, then infers and adapts to a new partner strategy by interpolating in the subspace.
arXiv Detail & Related papers (2022-01-05T04:40:13Z)
Learning to Cooperate with Unseen Agent via Meta-Reinforcement Learning [4.060731229044571]
Ad hoc teamwork problem describes situations where an agent has to cooperate with previously unseen agents to achieve a common goal. One could implement cooperative skills into an agent by using domain knowledge to design the agent's behavior. We apply meta-reinforcement learning (meta-RL) formulation in the context of the ad hoc teamwork problem.
arXiv Detail & Related papers (2021-11-05T12:01:28Z)
Targeted Data Acquisition for Evolving Negotiation Agents [6.953246373478702]
Successful negotiators must learn how to balance optimizing for self-interest and cooperation. Current artificial negotiation agents often heavily depend on the quality of the static datasets they were trained on. We introduce a targeted data acquisition framework where we guide the exploration of a reinforcement learning agent.
arXiv Detail & Related papers (2021-06-14T19:45:59Z)
Meta-Reinforcement Learning Robust to Distributional Shift via Model Identification and Experience Relabeling [126.69933134648541]
We present a meta-reinforcement learning algorithm that is both efficient and extrapolates well when faced with out-of-distribution tasks at test time. Our method is based on a simple insight: we recognize that dynamics models can be adapted efficiently and consistently with off-policy data.
arXiv Detail & Related papers (2020-06-12T13:34:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.