Related papers: Learning Robust Social Strategies with Large Language Models

Learning Robust Social Strategies with Large Language Models

URL: http://arxiv.org/abs/2511.19405v2
Date: Mon, 01 Dec 2025 16:27:49 GMT
Title: Learning Robust Social Strategies with Large Language Models
Authors: Dereck Piche, Mohammed Muqeeth, Milad Aghajohari, Juan Duque, Michael Noukhovitch, Aaron Courville,
Abstract summary: Reinforcement learning is effective for aligning large language models (LLMs) in the single-agent regime.<n>We show that standard RL in multi-agent settings often converges to defecting, self-interested policies.<n>To address this tendency of RL to converge to poor equilibria, we adapt a recent opponent-learning awareness algorithm, Advantage Alignment.
Score: 7.697496386429445
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As agentic AI becomes more widespread, agents with distinct and possibly conflicting goals will interact in complex ways. These multi-agent interactions pose a fundamental challenge, particularly in social dilemmas, where agents' individual incentives can undermine collective welfare. While reinforcement learning (RL) has been effective for aligning large language models (LLMs) in the single-agent regime, prior small-network results suggest that standard RL in multi-agent settings often converges to defecting, self-interested policies. We show the same effect in LLMs: despite cooperative priors, RL-trained LLM agents develop opportunistic behavior that can exploit even advanced closed-source models. To address this tendency of RL to converge to poor equilibria, we adapt a recent opponent-learning awareness algorithm, Advantage Alignment, to fine-tune LLMs toward multi-agent cooperation and non-exploitability. We then introduce a group-relative baseline that simplifies advantage computation in iterated games, enabling multi-agent training at LLM scale. We also contribute a novel social dilemma environment, Trust-and-Split, which requires natural language communication to achieve high collective welfare. Across a wide range of social dilemmas, policies learned with Advantage Alignment achieve higher collective payoffs while remaining robust against exploitation by greedy agents. We release all of our code to support future work on multi-agent RL training for LLMs.

Related papers

Robust Agents in Open-Ended Worlds [4.199586801784625]
In this thesis, we harness methodologies from open-endedness and multi-agent learning to train and evaluate robust AI agents.<n>We begin by introducing MiniHack, a sandbox framework for creating diverse environments through procedural content generation.<n>We then present Maestro, a novel approach for generating adversarial curricula that progressively enhance the robustness and generality of RL agents in two-player zero-sum games.
arXiv Detail & Related papers (2025-12-09T00:30:33Z)
CoMAS: Co-Evolving Multi-Agent Systems via Interaction Rewards [80.78748457530718]
Self-evolution is a central research topic in enabling large language model (LLM)-based agents to continually improve their capabilities after pretraining.<n>We introduce Co-Evolving Multi-Agent Systems (CoMAS), a novel framework that enables agents to improve autonomously by learning from inter-agent interactions.
arXiv Detail & Related papers (2025-10-09T17:50:26Z)
Opponent Shaping in LLM Agents [9.180524457769751]
We present the first investigation of opponent shaping (OS) with Large Language Models (LLMs)<n>Using ShapeLLM, we examine whether LLM agents can influence co-players' learning dynamics across diverse game-theoretic environments.<n>Our findings show that LLM agents can both shape and be shaped through interaction, establishing opponent shaping as a key dimension of multi-agent LLM research.
arXiv Detail & Related papers (2025-10-09T14:13:24Z)
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning [129.44038804430542]
We introduce AgentGym-RL, a new framework to train LLM agents for multi-turn interactive decision-making through RL.<n>We propose ScalingInter-RL, a training approach designed for exploration-exploitation balance and stable RL optimization.<n>Our agents match or surpass commercial models on 27 tasks across diverse environments.
arXiv Detail & Related papers (2025-09-10T16:46:11Z)
Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games [87.5673042805229]
How large language models balance self-interest and collective well-being is a critical challenge for ensuring alignment, robustness, and safe deployment.<n>We adapt a public goods game with institutional choice from behavioral economics, allowing us to observe how different LLMs navigate social dilemmas.<n>Surprisingly, we find that reasoning LLMs, such as the o1 series, struggle significantly with cooperation.
arXiv Detail & Related papers (2025-06-29T15:02:47Z)
Cultural Evolution of Cooperation among LLM Agents [1.4261864659766343]
We study whether a "society" of AI agents can learn mutually beneficial social norms in the face of incentives to defect.<n>We find that the evolution of cooperation differs markedly across base models.
arXiv Detail & Related papers (2024-12-13T16:45:49Z)
MALT: Improving Reasoning with Multi-Agent LLM Training [67.76186488361685]
MALT (Multi-Agent LLM Training) is a novel post-training strategy that divides the reasoning process into generation, verification, and refinement steps.<n>On MATH, GSM8K, and CSQA, MALT surpasses the same baseline LLM with a relative improvement of 15.66%, 7.42%, and 9.40% respectively.
arXiv Detail & Related papers (2024-12-02T19:30:36Z)
Large Language Model-based Human-Agent Collaboration for Complex Task Solving [94.3914058341565]
We introduce the problem of Large Language Models (LLMs)-based human-agent collaboration for complex task-solving. We propose a Reinforcement Learning-based Human-Agent Collaboration method, ReHAC. This approach includes a policy model designed to determine the most opportune stages for human intervention within the task-solving process.
arXiv Detail & Related papers (2024-02-20T11:03:36Z)
Group-Agent Reinforcement Learning [12.915860504511523]
It can largely benefit the reinforcement learning process of each agent if multiple geographically distributed agents perform their separate RL tasks cooperatively. We propose a distributed RL framework called DDAL (Decentralised Distributed Asynchronous Learning) designed for group-agent reinforcement learning (GARL)
arXiv Detail & Related papers (2022-02-10T16:40:59Z)
Emergent Social Learning via Multi-agent Reinforcement Learning [91.57176641192771]
Social learning is a key component of human and animal intelligence. This paper investigates whether independent reinforcement learning agents can learn to use social learning to improve their performance.
arXiv Detail & Related papers (2020-10-01T17:54:14Z)
Scalable Multi-Agent Inverse Reinforcement Learning via Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems. We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.