What-If Analysis of Large Language Models: Explore the Game World Using Proactive Thinking
- URL: http://arxiv.org/abs/2509.04791v1
- Date: Fri, 05 Sep 2025 04:05:27 GMT
- Title: What-If Analysis of Large Language Models: Explore the Game World Using Proactive Thinking
- Authors: Yuan Sui, Yanming Zhang, Yi Liao, Yu Gu, Guohua Tang, Zhongqian Sun, Wei Yang, Bryan Hooi,
- Abstract summary: Large language models (LLMs) excel at processing information reactively but lack the ability to systemically explore hypothetical futures.<n>We propose WiA-LLM, a new paradigm that equips LLMs with proactive thinking capabilities.<n>We validate WiA-LLM in Honor of Kings, a complex multiplayer game environment.
- Score: 50.72154186522052
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Large language models (LLMs) excel at processing information reactively but lack the ability to systemically explore hypothetical futures. They cannot ask, "what if we take this action? how will it affect the final outcome" and forecast its potential consequences before acting. This critical gap limits their utility in dynamic, high-stakes scenarios like strategic planning, risk assessment, and real-time decision making. To bridge this gap, we propose WiA-LLM, a new paradigm that equips LLMs with proactive thinking capabilities. Our approach integrates What-If Analysis (WIA), a systematic approach for evaluating hypothetical scenarios by changing input variables. By leveraging environmental feedback via reinforcement learning, WiA-LLM moves beyond reactive thinking. It dynamically simulates the outcomes of each potential action, enabling the model to anticipate future states rather than merely react to the present conditions. We validate WiA-LLM in Honor of Kings (HoK), a complex multiplayer game environment characterized by rapid state changes and intricate interactions. The game's real-time state changes require precise multi-step consequence prediction, making it an ideal testbed for our approach. Experimental results demonstrate WiA-LLM achieves a remarkable 74.2% accuracy in forecasting game-state changes (up to two times gain over baselines). The model shows particularly significant gains in high-difficulty scenarios where accurate foresight is critical. To our knowledge, this is the first work to formally explore and integrate what-if analysis capabilities within LLMs. WiA-LLM represents a fundamental advance toward proactive reasoning in LLMs, providing a scalable framework for robust decision-making in dynamic environments with broad implications for strategic applications.
Related papers
- Evaluating from Benign to Dynamic Adversarial: A Squid Game for Large Language Models [57.33350664910483]
We introduce Squid Game, a dynamic and adversarial evaluation environment with resource-constrained and asymmetric information settings.<n>We evaluate over 50 LLMs on Squid Game, presenting the largest behavioral evaluation study of general LLMs on dynamic adversarial scenarios.
arXiv Detail & Related papers (2025-11-12T06:06:29Z) - See, Think, Act: Online Shopper Behavior Simulation with VLM Agents [58.92444959954643]
This paper investigates the integration of visual information, specifically webpage screenshots, into behavior simulation via VLMs.<n>We employ SFT for joint action prediction and rationale generation, conditioning on the full interaction context.<n>To further enhance reasoning capabilities, we integrate RL with a hierarchical reward structure, scaled by a difficulty-aware factor.
arXiv Detail & Related papers (2025-10-22T05:07:14Z) - DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios [57.327907850766785]
characterization of deception across realistic real-world scenarios remains underexplored.<n>We establish DeceptionBench, the first benchmark that systematically evaluates how deceptive tendencies manifest across different domains.<n>On the intrinsic dimension, we explore whether models exhibit self-interested egoistic tendencies or sycophantic behaviors that prioritize user appeasement.<n>We incorporate sustained multi-turn interaction loops to construct a more realistic simulation of real-world feedback dynamics.
arXiv Detail & Related papers (2025-10-17T10:14:26Z) - Where to show Demos in Your Prompt: A Positional Bias of In-Context Learning [19.313795358097483]
In-context learning (ICL) is a critical emerging capability of large language models (LLMs)<n>This paper investigates an unexplored new positional bias of ICL for the first time.<n>We observe that the predictions and accuracy can drift drastically when the positions of demos, the system prompt, and the user message are varied.
arXiv Detail & Related papers (2025-07-30T17:59:46Z) - Integrating Counterfactual Simulations with Language Models for Explaining Multi-Agent Behaviour [26.04296415316974]
We propose Agentic eXplanations via Interrogative Simulation (AXIS)<n>AXIS generates intelligible causal explanations for pre-trained multi-agent policies.<n>We evaluate AXIS on autonomous driving across 10 scenarios for 5 LLMs.
arXiv Detail & Related papers (2025-05-23T12:19:18Z) - APEX: Empowering LLMs with Physics-Based Task Planning for Real-time Insight [3.5385022178794805]
APEX (Anticipatory Physics-Enhanced Execution) is a framework that equips Large Language Models with physics-driven foresight for real-time task planning.<n>APEX significantly outperforms standard LLMs and VLM-based models.
arXiv Detail & Related papers (2025-05-20T04:34:58Z) - Belief Revision: The Adaptability of Large Language Models Reasoning [63.0281286287648]
We introduce Belief-R, a new dataset designed to test LMs' belief revision ability when presented with new evidence.
Inspired by how humans suppress prior inferences, this task assesses LMs within the newly proposed delta reasoning framework.
We evaluate $sim$30 LMs across diverse prompting strategies and found that LMs generally struggle to appropriately revise their beliefs in response to new information.
arXiv Detail & Related papers (2024-06-28T09:09:36Z) - HAZARD Challenge: Embodied Decision Making in Dynamically Changing
Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind.
This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z) - User Behavior Simulation with Large Language Model based Agents [116.74368915420065]
We propose an LLM-based agent framework and design a sandbox environment to simulate real user behaviors.
Based on extensive experiments, we find that the simulated behaviors of our method are very close to the ones of real humans.
arXiv Detail & Related papers (2023-06-05T02:58:35Z) - Learning the Effects of Physical Actions in a Multi-modal Environment [17.757831697284498]
Large Language Models (LLMs) handle physical commonsense information inadequately.
We introduce the multi-modal task of predicting the outcomes of actions solely from realistic sensory inputs.
We show that multi-modal models can capture physical commonsense when augmented with visual information.
arXiv Detail & Related papers (2023-01-27T16:49:52Z) - Instance-Aware Predictive Navigation in Multi-Agent Environments [93.15055834395304]
We propose an Instance-Aware Predictive Control (IPC) approach, which forecasts interactions between agents as well as future scene structures.
We adopt a novel multi-instance event prediction module to estimate the possible interaction among agents in the ego-centric view.
We design a sequential action sampling strategy to better leverage predicted states on both scene-level and instance-level.
arXiv Detail & Related papers (2021-01-14T22:21:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.