LLMs as Strategic Actors: Behavioral Alignment, Risk Calibration, and Argumentation Framing in Geopolitical Simulations
- URL: http://arxiv.org/abs/2603.02128v1
- Date: Mon, 02 Mar 2026 17:46:17 GMT
- Title: LLMs as Strategic Actors: Behavioral Alignment, Risk Calibration, and Argumentation Framing in Geopolitical Simulations
- Authors: Veronika Solopova, Viktoria Skorik, Maksym Tereshchenko, Alina Haidun, Ostap Vykhopen,
- Abstract summary: Large language models (LLMs) are increasingly proposed as agents in strategic decision environments.<n>We evaluate six popular state-of-the-art LLMs alongside results from human results across four real-world crisis simulation scenarios.<n>We compare models to humans in action alignment, risk calibration through chosen actions' severity, and argumentative framing grounded in international relations theory.
- Score: 2.430361444826172
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are increasingly proposed as agents in strategic decision environments, yet their behavior in structured geopolitical simulations remains under-researched. We evaluate six popular state-of-the-art LLMs alongside results from human results across four real-world crisis simulation scenarios, requiring models to select predefined actions and justify their decisions across multiple rounds. We compare models to humans in action alignment, risk calibration through chosen actions' severity, and argumentative framing grounded in international relations theory. Results show that models approximate human decision patterns in base simulation rounds but diverge over time, displaying distinct behavioural profiles and strategy updates. LLM explanations for chosen actions across all models exhibit a strong normative-cooperative framing centered on stability, coordination, and risk mitigation, with limited adversarial reasoning.
Related papers
- Strategy Executability in Mathematical Reasoning: Leveraging Human-Model Differences for Effective Guidance [86.46794021499511]
We show a previously underexplored gap between strategy usage and strategy executability.<n>We propose Selective Strategy Retrieval (SSR), a test-time framework that explicitly models executability.<n> SSR yields reliable and consistent improvements over direct solving, in-context learning, and single-source guidance.
arXiv Detail & Related papers (2026-02-26T03:34:23Z) - Evaluating from Benign to Dynamic Adversarial: A Squid Game for Large Language Models [57.33350664910483]
We introduce Squid Game, a dynamic and adversarial evaluation environment with resource-constrained and asymmetric information settings.<n>We evaluate over 50 LLMs on Squid Game, presenting the largest behavioral evaluation study of general LLMs on dynamic adversarial scenarios.
arXiv Detail & Related papers (2025-11-12T06:06:29Z) - Plan before Solving: Problem-Aware Strategy Routing for Mathematical Reasoning with LLMs [49.995906301946]
Existing methods usually leverage a fixed strategy to guide Large Language Models (LLMs) to perform mathematical reasoning.<n>Our analysis reveals that the single strategy cannot adapt to problem-specific requirements and thus overlooks the trade-off between effectiveness and efficiency.<n>We propose Planning and Routing through Instance-Specific Modeling (PRISM), a novel framework that decouples mathematical reasoning into two stages: strategy planning and targeted execution.
arXiv Detail & Related papers (2025-09-29T07:22:41Z) - ToMPO: Training LLM Strategic Decision Making from a Multi-Agent Perspective [16.275962506416064]
Large Language Models (LLMs) have been used to make decisions in complex scenarios.<n>We propose a ToMPO algorithm to optimize the perception of other individual strategies and the game situation trends.<n>The ToMPO algorithm outperforms the GRPO method by 35% in terms of model output compliance and cooperative outcomes.
arXiv Detail & Related papers (2025-09-25T13:25:15Z) - Noise, Adaptation, and Strategy: Assessing LLM Fidelity in Decision-Making [0.030586855806896043]
Large language models (LLMs) are increasingly used in social science simulations.<n>We propose a process-oriented evaluation framework to examine how LLM agents adapt under different levels of external guidance and human-derived noise.<n>We find that LLMs, by default, converge on stable and conservative strategies that diverge from observed human behaviors.
arXiv Detail & Related papers (2025-08-21T18:55:53Z) - Beyond Nash Equilibrium: Bounded Rationality of LLMs and humans in Strategic Decision-making [33.2843381902912]
Large language models are increasingly used in strategic decision-making settings.<n>We compare LLMs and humans using experimental paradigms adapted from behavioral game-theory research.
arXiv Detail & Related papers (2025-06-11T04:43:54Z) - Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Experiments [5.1382713576243955]
Large language models (LLMs) are increasingly used to simulate or automate human behavior in sequential decision-making settings.<n>We focus on the exploration-exploitation (E&E) tradeoff, a fundamental aspect of dynamic decision-making under uncertainty.<n>We find that enabling thinking in LLMs shifts their behavior toward more human-like behavior, characterized by a mix of random and directed exploration.
arXiv Detail & Related papers (2025-05-15T02:09:18Z) - MF-LLM: Simulating Population Decision Dynamics via a Mean-Field Large Language Model Framework [53.82097200295448]
Mean-Field LLM (MF-LLM) is first to incorporate mean field theory into social simulation.<n>MF-LLM models bidirectional interactions between individuals and the population through an iterative process.<n> IB-Tune is a novel fine-tuning method inspired by the Information Bottleneck principle.
arXiv Detail & Related papers (2025-04-30T12:41:51Z) - How Strategic Agents Respond: Comparing Analytical Models with LLM-Generated Responses in Strategic Classification [11.614944245315186]
We use Strategic Classification to study the interaction between agents and decision-makers.<n>This shift prompts two questions: (i) Can LLMs generate effective and socially responsible strategies in SC settings?<n>We show that even without access to the decision policy, LLMs can generate effective strategies that improve both agents' scores and qualification.
arXiv Detail & Related papers (2025-01-20T01:39:03Z) - A Large-Scale Simulation on Large Language Models for Decision-Making in Political Science [18.521101885334673]
We develop a theory-driven, multi-step reasoning framework to simulate voter decision-making at scale.<n>We conduct large-scale simulations of recent U.S. presidential elections using synthetic personas calibrated to real-world voter data.
arXiv Detail & Related papers (2024-12-19T07:10:51Z) - Modeling Boundedly Rational Agents with Latent Inference Budgets [56.24971011281947]
We introduce a latent inference budget model (L-IBM) that models agents' computational constraints explicitly.
L-IBMs make it possible to learn agent models using data from diverse populations of suboptimal actors.
We show that L-IBMs match or outperform Boltzmann models of decision-making under uncertainty.
arXiv Detail & Related papers (2023-12-07T03:55:51Z) - CoMPosT: Characterizing and Evaluating Caricature in LLM Simulations [61.9212914612875]
We present a framework to characterize LLM simulations using four dimensions: Context, Model, Persona, and Topic.
We use this framework to measure open-ended LLM simulations' susceptibility to caricature, defined via two criteria: individuation and exaggeration.
We find that for GPT-4, simulations of certain demographics (political and marginalized groups) and topics (general, uncontroversial) are highly susceptible to caricature.
arXiv Detail & Related papers (2023-10-17T18:00:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.