Related papers: LLMs as Policy-Agnostic Teammates: A Case Study in Human Proxy Design for Heterogeneous Agent Teams

LLMs as Policy-Agnostic Teammates: A Case Study in Human Proxy Design for Heterogeneous Agent Teams

URL: http://arxiv.org/abs/2510.06151v1
Date: Tue, 07 Oct 2025 17:21:20 GMT
Title: LLMs as Policy-Agnostic Teammates: A Case Study in Human Proxy Design for Heterogeneous Agent Teams
Authors: Aju Ani Justus, Chris Baber,
Abstract summary: A critical challenge in modelling Heterogeneous-Agent Teams is training agents to collaborate with teammates whose policies are inaccessible or non-stationary, such as humans.<n>Traditional approaches rely on expensive human-in-the-loop data, which limits scalability.<n>We propose using Large Language Models (LLMs) as policy-agnostic human proxies to generate synthetic data that mimics human decision-making.
Score: 0.7734726150561086
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A critical challenge in modelling Heterogeneous-Agent Teams is training agents to collaborate with teammates whose policies are inaccessible or non-stationary, such as humans. Traditional approaches rely on expensive human-in-the-loop data, which limits scalability. We propose using Large Language Models (LLMs) as policy-agnostic human proxies to generate synthetic data that mimics human decision-making. To evaluate this, we conduct three experiments in a grid-world capture game inspired by Stag Hunt, a game theory paradigm that balances risk and reward. In Experiment 1, we compare decisions from 30 human participants and 2 expert judges with outputs from LLaMA 3.1 and Mixtral 8x22B models. LLMs, prompted with game-state observations and reward structures, align more closely with experts than participants, demonstrating consistency in applying underlying decision criteria. Experiment 2 modifies prompts to induce risk-sensitive strategies (e.g. "be risk averse"). LLM outputs mirror human participants' variability, shifting between risk-averse and risk-seeking behaviours. Finally, Experiment 3 tests LLMs in a dynamic grid-world where the LLM agents generate movement actions. LLMs produce trajectories resembling human participants' paths. While LLMs cannot yet fully replicate human adaptability, their prompt-guided diversity offers a scalable foundation for simulating policy-agnostic teammates.

Related papers

Multi-Agent Evolve: LLM Self-Improve through Co-evolution [53.00458074754831]
Reinforcement Learning (RL) has demonstrated significant potential in enhancing the reasoning capabilities of large language models (LLMs)<n>Recent Self-Play RL methods, inspired by the success of the paradigm in games and Go, aim to enhance LLM reasoning capabilities without human-annotated data.<n>We propose Multi-Agent Evolve (MAE), a framework that enables LLMs to self-evolve in solving diverse tasks, including mathematics, reasoning, and general knowledge Q&A.
arXiv Detail & Related papers (2025-10-27T17:58:02Z)
Beyond Survival: Evaluating LLMs in Social Deduction Games with Human-Aligned Strategies [54.08697738311866]
Social deduction games like Werewolf combine language, reasoning, and strategy.<n>We curate a high-quality, human-verified multimodal Werewolf dataset containing over 100 hours of video, 32.4M utterance tokens, and 15 rule variants.<n>We propose a novel strategy-alignment evaluation that leverages the winning faction's strategies as ground truth in two stages.
arXiv Detail & Related papers (2025-10-13T13:33:30Z)
Can LLMs effectively provide game-theoretic-based scenarios for cybersecurity? [51.96049148869987]
Large Language Models (LLMs) offer new tools and challenges for the security of computer systems.<n>We investigate whether classical game-theoretic frameworks can effectively capture the behaviours of LLM-driven actors and bots.
arXiv Detail & Related papers (2025-08-04T08:57:14Z)
Beyond Nash Equilibrium: Bounded Rationality of LLMs and humans in Strategic Decision-making [33.2843381902912]
Large language models are increasingly used in strategic decision-making settings.<n>We compare LLMs and humans using experimental paradigms adapted from behavioral game-theory research.
arXiv Detail & Related papers (2025-06-11T04:43:54Z)
Arbiters of Ambivalence: Challenges of Using LLMs in No-Consensus Tasks [52.098988739649705]
This study examines the biases and limitations of LLMs in three roles: answer generator, judge, and debater.<n>We develop a no-consensus'' benchmark by curating examples that encompass a variety of a priori ambivalent scenarios.<n>Our results show that while LLMs can provide nuanced assessments when generating open-ended answers, they tend to take a stance on no-consensus topics when employed as judges or debaters.
arXiv Detail & Related papers (2025-05-28T01:31:54Z)
Humans expect rationality and cooperation from LLM opponents in strategic games [0.0]
We present the results of the first monetarily-incentivised laboratory experiment looking at differences in human behaviour.<n>We show that, in this environment, human subjects choose significantly lower numbers when playing against LLMs than humans.<n>This shift is mainly driven by subjects with high strategic reasoning ability.
arXiv Detail & Related papers (2025-05-16T09:01:09Z)
Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Experiments [5.1382713576243955]
Large language models (LLMs) are increasingly used to simulate or automate human behavior in sequential decision-making settings.<n>We focus on the exploration-exploitation (E&E) tradeoff, a fundamental aspect of dynamic decision-making under uncertainty.<n>We find that enabling thinking in LLMs shifts their behavior toward more human-like behavior, characterized by a mix of random and directed exploration.
arXiv Detail & Related papers (2025-05-15T02:09:18Z)
Nicer Than Humans: How do Large Language Models Behave in the Prisoner's Dilemma? [0.1474723404975345]
We study the cooperative behavior of Llama2 when playing the Iterated Prisoner's Dilemma against random adversaries displaying various levels of hostility. We find that Llama2 tends not to initiate defection but it adopts a cautious approach towards cooperation. In comparison to prior research on human participants, Llama2 exhibits a greater inclination towards cooperative behavior.
arXiv Detail & Related papers (2024-06-19T14:51:14Z)
Human vs. Machine: Behavioral Differences Between Expert Humans and Language Models in Wargame Simulations [1.6108153271585284]
We show that large language models (LLMs) behave differently compared to humans in high-stakes military decision-making scenarios. Our results motivate policymakers to be cautious before granting autonomy or following AI-based strategy recommendations.
arXiv Detail & Related papers (2024-03-06T02:23:32Z)
ALYMPICS: LLM Agents Meet Game Theory -- Exploring Strategic Decision-Making with AI Agents [77.34720446306419]
Alympics is a systematic simulation framework utilizing Large Language Model (LLM) agents for game theory research. Alympics creates a versatile platform for studying complex game theory problems.
arXiv Detail & Related papers (2023-11-06T16:03:46Z)
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate [85.3444184685235]
We propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution. Our framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation.
arXiv Detail & Related papers (2023-05-30T15:25:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.