Multi-agent cooperation through in-context co-player inference
- URL: http://arxiv.org/abs/2602.16301v1
- Date: Wed, 18 Feb 2026 09:31:43 GMT
- Title: Multi-agent cooperation through in-context co-player inference
- Authors: Marissa A. Weis, Maciej Wołczyk, Rajai Nasser, Rif A. Saurous, Blaise Agüera y Arcas, João Sacramento, Alexander Meulemans,
- Abstract summary: We show that the in-context learning capabilities of sequence models allow for co-player learning awareness without requiring hardcoded assumptions or explicit timescale separation.<n>We find that the cooperative mechanism identified in prior work-where vulnerability to extortion drives mutual shaping-emerges naturally in this setting.<n>Our results suggest that standard decentralized reinforcement learning on sequence models combined with co-player diversity provides a scalable path to learning cooperative behaviors.
- Score: 44.621248321369514
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Achieving cooperation among self-interested agents remains a fundamental challenge in multi-agent reinforcement learning. Recent work showed that mutual cooperation can be induced between "learning-aware" agents that account for and shape the learning dynamics of their co-players. However, existing approaches typically rely on hardcoded, often inconsistent, assumptions about co-player learning rules or enforce a strict separation between "naive learners" updating on fast timescales and "meta-learners" observing these updates. Here, we demonstrate that the in-context learning capabilities of sequence models allow for co-player learning awareness without requiring hardcoded assumptions or explicit timescale separation. We show that training sequence model agents against a diverse distribution of co-players naturally induces in-context best-response strategies, effectively functioning as learning algorithms on the fast intra-episode timescale. We find that the cooperative mechanism identified in prior work-where vulnerability to extortion drives mutual shaping-emerges naturally in this setting: in-context adaptation renders agents vulnerable to extortion, and the resulting mutual pressure to shape the opponent's in-context learning dynamics resolves into the learning of cooperative behavior. Our results suggest that standard decentralized reinforcement learning on sequence models combined with co-player diversity provides a scalable path to learning cooperative behaviors.
Related papers
- Structured Imitation Learning of Interactive Policies through Inverse Games [0.0]
We introduce a structured imitation learning framework for interactive policies by combining generative single-agent policy learning with a flexible yet expressive game-theoretic structure.<n>Preliminary results in a synthetic 5-agent social navigation task show that our method significantly improves non-interactive policies and performs comparably to the ground truth interactive policy using only 50 demonstrations.
arXiv Detail & Related papers (2025-11-17T00:42:45Z) - Multi-agent cooperation through learning-aware policy gradients [53.63948041506278]
Self-interested individuals often fail to cooperate, posing a fundamental challenge for multi-agent learning.<n>We present the first unbiased, higher-derivative-free policy gradient algorithm for learning-aware reinforcement learning.<n>We derive from the iterated prisoner's dilemma a novel explanation for how and when cooperation arises among self-interested learning-aware agents.
arXiv Detail & Related papers (2024-10-24T10:48:42Z) - Decentralized and Lifelong-Adaptive Multi-Agent Collaborative Learning [57.652899266553035]
Decentralized and lifelong-adaptive multi-agent collaborative learning aims to enhance collaboration among multiple agents without a central server.
We propose DeLAMA, a decentralized multi-agent lifelong collaborative learning algorithm with dynamic collaboration graphs.
arXiv Detail & Related papers (2024-03-11T09:21:11Z) - Intrinsic fluctuations of reinforcement learning promote cooperation [0.0]
Cooperating in social dilemma situations is vital for animals, humans, and machines.
We demonstrate which and how individual elements of the multi-agent learning setting lead to cooperation.
arXiv Detail & Related papers (2022-09-01T09:14:47Z) - Behaviour-conditioned policies for cooperative reinforcement learning
tasks [41.74498230885008]
In various real-world tasks, an agent needs to cooperate with unknown partner agent types.
Deep reinforcement learning models can be trained to deliver the required functionality but are known to suffer from sample inefficiency and slow learning.
We suggest a method, where we synthetically produce populations of agents with different behavioural patterns together with ground truth data of their behaviour.
We additionally suggest an agent architecture, which can efficiently use the generated data and gain the meta-learning capability.
arXiv Detail & Related papers (2021-10-04T09:16:41Z) - Multi-Agent Imitation Learning with Copulas [102.27052968901894]
Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions.
In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems.
Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
arXiv Detail & Related papers (2021-07-10T03:49:41Z) - Bridging the Imitation Gap by Adaptive Insubordination [88.35564081175642]
We show that when the teaching agent makes decisions with access to privileged information, this information is marginalized during imitation learning.
We propose 'Adaptive Insubordination' (ADVISOR) to address this gap.
ADVISOR dynamically weights imitation and reward-based reinforcement learning losses during training, enabling on-the-fly switching between imitation and exploration.
arXiv Detail & Related papers (2020-07-23T17:59:57Z) - Multi-Agent Interactions Modeling with Correlated Policies [53.38338964628494]
In this paper, we cast the multi-agent interactions modeling problem into a multi-agent imitation learning framework.
We develop a Decentralized Adrial Imitation Learning algorithm with Correlated policies (CoDAIL)
Various experiments demonstrate that CoDAIL can better regenerate complex interactions close to the demonstrators.
arXiv Detail & Related papers (2020-01-04T17:31:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.