Scheming Ability in LLM-to-LLM Strategic Interactions
- URL: http://arxiv.org/abs/2510.12826v1
- Date: Sat, 11 Oct 2025 04:42:29 GMT
- Title: Scheming Ability in LLM-to-LLM Strategic Interactions
- Authors: Thao Pham,
- Abstract summary: Large language model (LLM) agents are deployed autonomously in diverse contexts.<n>We investigate the ability and propensity of frontier LLM agents through two game-theoretic frameworks.<n>Tests four models (GPT-4o, Gemini-2.5-pro, Claude-3.7-Sonnet, and Llama-3.3-70b)
- Score: 4.873362301533824
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As large language model (LLM) agents are deployed autonomously in diverse contexts, evaluating their capacity for strategic deception becomes crucial. While recent research has examined how AI systems scheme against human developers, LLM-to-LLM scheming remains underexplored. We investigate the scheming ability and propensity of frontier LLM agents through two game-theoretic frameworks: a Cheap Talk signaling game and a Peer Evaluation adversarial game. Testing four models (GPT-4o, Gemini-2.5-pro, Claude-3.7-Sonnet, and Llama-3.3-70b), we measure scheming performance with and without explicit prompting while analyzing scheming tactics through chain-of-thought reasoning. When prompted, most models, especially Gemini-2.5-pro and Claude-3.7-Sonnet, achieved near-perfect performance. Critically, models exhibited significant scheming propensity without prompting: all models chose deception over confession in Peer Evaluation (100% rate), while models choosing to scheme in Cheap Talk succeeded at 95-100% rates. These findings highlight the need for robust evaluations using high-stakes game-theoretic scenarios in multi-agent settings.
Related papers
- AgentCPM-Explore: Realizing Long-Horizon Deep Exploration for Edge-Scale Agents [75.67445299298949]
AgentCPM-Explore is a compact 4B agent model with high knowledge density and strong exploration capability.<n>We introduce a holistic training framework featuring parameter-space model fusion, reward signal denoising, and contextual information refinement.<n>AgentCPM-Explore achieves state-of-the-art (SOTA) performance among 4B-class models, matches or surpasses 8B-class SOTA models on four benchmarks, and even outperforms larger-scale models such as Claude-4.5-Sonnet or DeepSeek-v3.2 in five benchmarks.
arXiv Detail & Related papers (2026-02-06T08:24:59Z) - Multimodal RewardBench 2: Evaluating Omni Reward Models for Interleaved Text and Image [58.14192385042352]
We introduce Multimodal RewardBench 2 (MMRB2), the first benchmark for reward models on multimodal understanding and (interleaved) generation.<n>MMRB2 spans four tasks: text-to-image, image editing, interleaved generation, and multimodal reasoning.<n>It provides 1,000 expert-annotated preference pairs per task from 23 models and agents across 21 source tasks.
arXiv Detail & Related papers (2025-12-18T18:56:04Z) - VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents [130.70999337445468]
Key challenge in training Vision-Language Model (VLM) agents, compared to Language Model (LLM) agents, is shift from textual states to complex visual observations.<n>We ask: Can VLM agents construct internal world models through explicit visual state reasoning?<n>We architecturally enforce and reward the agent's reasoning process via reinforcement learning (RL)<n>We find that the agent's reasoning into State Estimation and Transition Modeling is critical for success.
arXiv Detail & Related papers (2025-10-19T16:05:07Z) - Evaluating Prompting Strategies and Large Language Models in Systematic Literature Review Screening: Relevance and Task-Stage Classification [1.2234742322758418]
This study quantifies how prompting strategies interact with large language models (LLMs) to automate the screening stage of systematic literature reviews.<n>We evaluate six LLMs (GPT-4o, GPT-4o-mini, DeepSeek-Chat-V3, Gemini-2.5-Flash, Claude-3.5-Haiku, Llama-4-Maverick) under five prompt types.<n>CoT-few-shot yields the most reliable precision-recall balance; zero-shot maximizes recall for high-sensitivity passes; and self-reflection underperforms due to over-inclusivity and instability across models.
arXiv Detail & Related papers (2025-10-17T16:53:09Z) - Evaluating & Reducing Deceptive Dialogue From Language Models with Multi-turn RL [64.3268313484078]
Large Language Models (LLMs) interact with millions of people worldwide in applications such as customer support, education and healthcare.<n>Their ability to produce deceptive outputs, whether intentionally or inadvertently, poses significant safety concerns.<n>We investigate the extent to which LLMs engage in deception within dialogue, and propose the belief misalignment metric to quantify deception.
arXiv Detail & Related papers (2025-10-16T05:29:36Z) - Reliable Decision Support with LLMs: A Framework for Evaluating Consistency in Binary Text Classification Applications [0.7124971549479361]
This study introduces a framework for evaluating consistency in large language model (LLM) binary text classification.<n>We determine sample size requirements, develop metrics for invalid responses, and evaluate intra- and inter-rater reliability.
arXiv Detail & Related papers (2025-05-20T21:12:58Z) - SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning [99.645427839457]
Self-Play Critic (SPC) is a novel approach where a critic model evolves its ability to assess reasoning steps through adversarial self-play games.<n>SPC involves fine-tuning two copies of a base model to play two roles, namely a "sneaky generator" and a "critic"
arXiv Detail & Related papers (2025-04-27T08:45:06Z) - Diversity of Thought Elicits Stronger Reasoning Capabilities in Multi-Agent Debate Frameworks [0.0]
Chain-of-thought prompting, self-verification, and multi-agent debate are proposed to improve the reasoning and factual accuracy of large language models.<n>We find that multi-agent debate helps at any model scale, and that diversity of thought elicits stronger reasoning in debating LLMs.
arXiv Detail & Related papers (2024-10-10T21:59:01Z) - How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments [83.78240828340681]
GAMA($gamma$)-Bench is a new framework for evaluating Large Language Models' Gaming Ability in Multi-Agent environments.<n>$gamma$-Bench includes eight classical game theory scenarios and a dynamic scoring scheme specially designed to assess LLMs' performance.<n>Our results indicate GPT-3.5 demonstrates strong robustness but limited generalizability, which can be enhanced using methods like Chain-of-Thought.
arXiv Detail & Related papers (2024-03-18T14:04:47Z) - How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts [54.07541591018305]
We present MAD-Bench, a benchmark that contains 1000 test samples divided into 5 categories, such as non-existent objects, count of objects, and spatial relationship.
We provide a comprehensive analysis of popular MLLMs, ranging from GPT-4v, Reka, Gemini-Pro, to open-sourced models, such as LLaVA-NeXT and MiniCPM-Llama3.
While GPT-4o achieves 82.82% accuracy on MAD-Bench, the accuracy of any other model in our experiments ranges from 9% to 50%.
arXiv Detail & Related papers (2024-02-20T18:31:27Z) - Evaluating Language Model Agency through Negotiations [39.87262815823634]
Negotiation games enable us to study multi-turn, and cross-model interactions, modulate complexity, and side-step accidental evaluation data leakage.
We use our approach to test six widely used and publicly accessible LMs, evaluating performance and alignment in both self-play and cross-play settings.
arXiv Detail & Related papers (2024-01-09T13:19:37Z) - Better Zero-Shot Reasoning with Role-Play Prompting [10.90357246745529]
Role-play prompting consistently surpasses the standard zero-shot approach across most datasets.
This highlights its potential to augment the reasoning capabilities of large language models.
arXiv Detail & Related papers (2023-08-15T11:08:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.