Inducing Causal World Models in LLMs for Zero-Shot Physical Reasoning
- URL: http://arxiv.org/abs/2507.19855v2
- Date: Tue, 29 Jul 2025 01:57:01 GMT
- Title: Inducing Causal World Models in LLMs for Zero-Shot Physical Reasoning
- Authors: Aditya Sharma, Linh Nguyen, Ananya Gupta, Chengyu Wang, Chiamaka Adebayo, Jakub Kowalski,
- Abstract summary: Causal World Model Induction (CWMI) is a framework designed to embed an explicit model of causal physics within an AI system.<n>CWMI significantly outperforms state-of-the-art AI systems on zero-shot physical reasoning tasks.
- Score: 8.647104927811135
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs), despite their advanced linguistic capabilities, fundamentally lack an intuitive understanding of physical dynamics, which limits their effectiveness in real-world scenarios that require causal reasoning. In this paper, we introduce Causal World Model Induction (CWMI), a novel framework designed to embed an explicit model of causal physics within an LLM. Our approach incorporates a dedicated Causal Physics Module (CPM) and a new training objective called Causal Intervention Loss, encouraging the model to learn cause-and-effect relationships from multimodal data. By training the model to predict the outcomes of hypothetical interventions instead of merely capturing statistical correlations, CWMI develops a robust internal representation of physical laws. Experimental results show that CWMI significantly outperforms state-of-the-art LLMs on zero-shot physical reasoning tasks, including the PIQA benchmark and our newly proposed PhysiCa-Bench dataset. These findings demonstrate that inducing a causal world model is a critical step toward more reliable and generalizable AI systems.
Related papers
- Fundamental Reasoning Paradigms Induce Out-of-Domain Generalization in Language Models [43.76842321707181]
In this study, we shed light on how the interplay between these core paradigms influences Large Language Model (LLM) reasoning.<n>We first collect a new dataset of reasoning trajectories from symbolic tasks, each targeting one of the three fundamental paradigms.<n>We then investigate effective ways for inducing these skills into LLMs.
arXiv Detail & Related papers (2026-02-09T13:51:48Z) - Aligning Agentic World Models via Knowledgeable Experience Learning [68.85843641222186]
We introduce WorldMind, a framework that constructs a symbolic World Knowledge Repository by synthesizing environmental feedback.<n>WorldMind achieves superior performance compared to baselines with remarkable cross-model and cross-environment transferability.
arXiv Detail & Related papers (2026-01-19T17:33:31Z) - How and Why LLMs Generalize: A Fine-Grained Analysis of LLM Reasoning from Cognitive Behaviors to Low-Level Patterns [51.02752099869218]
Large Language Models (LLMs) display strikingly different generalization behaviors.<n>We introduce a novel benchmark that decomposes reasoning into atomic core skills.<n>We show that RL-tuned models maintain more stable behavioral profiles and resist collapse in reasoning skills, whereas SFT models exhibit sharper drift and overfit to surface patterns.
arXiv Detail & Related papers (2025-12-30T08:16:20Z) - Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction [53.745458605360675]
We explore world-model internalization through efficient interaction and active reasoning (WMAct)<n>WMAct liberates the model from structured reasoning, allowing the model to shape thinking directly through its doing.<n>Our experiments on Sokoban, Maze, and Taxi show that WMAct yields effective world model reasoning capable of resolving tasks in a single turn.
arXiv Detail & Related papers (2025-11-28T18:59:47Z) - Models of Heavy-Tailed Mechanistic Universality [62.107333654304014]
We propose a family of random matrix models to explore attributes that give rise to heavy-tailed behavior in trained neural networks.<n>Under this model, spectral densities with power laws on tails arise through a combination of three independent factors.<n> Implications of our model on other appearances of heavy tails, including neural scaling laws, trajectories, and the five-plus-one phases of neural network training, are discussed.
arXiv Detail & Related papers (2025-06-04T00:55:01Z) - Learning Local Causal World Models with State Space Models and Attention [1.5498250598583487]
We show that a SSM can model the dynamics of a simple environment and learn a causal model at the same time.<n>We pave the way for further experiments that lean into the strength of SSMs and further enhance them with causal awareness.
arXiv Detail & Related papers (2025-05-04T11:57:02Z) - Cognitive Activation and Chaotic Dynamics in Large Language Models: A Quasi-Lyapunov Analysis of Reasoning Mechanisms [6.375329734462518]
This paper proposes the "Cognitive Activation" theory, revealing the essence of Large Language Models' reasoning mechanisms.<n> Experiments show that the model's information accumulation follows a nonlinear exponential law, and the Multilayer Perceptron (MLP) accounts for a higher proportion in the final output.<n>This research provides a chaos theory framework for the interpretability of LLMs' reasoning and reveals potential pathways for balancing creativity and reliability in model design.
arXiv Detail & Related papers (2025-03-15T08:15:10Z) - Superficial Self-Improved Reasoners Benefit from Model Merging [49.09091498084467]
Self-improvement as a solution to synthesizing high-quality data corpus.<n>In particular, our analysis reveals that even when LMs show improved in-domain (ID) reasoning accuracy, they actually compromise their generalized reasoning capabilities.<n>We propose Iterative Model Merging (IMM), a method that strategically combines weights from original and self-improved models to preserve generalization.
arXiv Detail & Related papers (2025-03-03T22:41:25Z) - LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models [35.01842161084472]
We propose a new physical reasoning task and a dataset, dubbed TraySim.<n>Our task involves predicting the dynamics of several objects on a tray that is given an external impact.<n>We present LLMPhy, a zero-shot black-box optimization framework that leverages the physics knowledge and program synthesis abilities of LLMs.<n>Our results show that the combination of the LLM and the physics engine leads to state-of-the-art zero-shot physical reasoning performance.
arXiv Detail & Related papers (2024-11-12T18:56:58Z) - Failure Modes of LLMs for Causal Reasoning on Narratives [51.19592551510628]
We investigate the interaction between world knowledge and logical reasoning.<n>We find that state-of-the-art large language models (LLMs) often rely on superficial generalizations.<n>We show that simple reformulations of the task can elicit more robust reasoning behavior.
arXiv Detail & Related papers (2024-10-31T12:48:58Z) - Investigating the Impact of Model Complexity in Large Language Models [3.7919508292745676]
Large Language Models (LLMs) based on the pre-trained fine-tuning paradigm have become pivotal in solving natural language processing tasks.
In this paper, we focus on autoregressive LLMs and propose to employ Hidden Markov Models (HMMs) to model them.
arXiv Detail & Related papers (2024-10-01T13:53:44Z) - Making Large Language Models into World Models with Precondition and Effect Knowledge [1.8561812622368763]
We show that Large Language Models (LLMs) can be induced to perform two critical world model functions.
We validate that the precondition and effect knowledge generated by our models aligns with human understanding of world dynamics.
arXiv Detail & Related papers (2024-09-18T19:28:04Z) - The Essential Role of Causality in Foundation World Models for Embodied AI [102.75402420915965]
Embodied AI agents will require the ability to perform new tasks in many different real-world environments.
Current foundation models fail to accurately model physical interactions and are therefore insufficient for Embodied AI.
The study of causality lends itself to the construction of veridical world models.
arXiv Detail & Related papers (2024-02-06T17:15:33Z) - Targeted Reduction of Causal Models [55.11778726095353]
Causal Representation Learning offers a promising avenue to uncover interpretable causal patterns in simulations.
We introduce Targeted Causal Reduction (TCR), a method for condensing complex intervenable models into a concise set of causal factors.
Its ability to generate interpretable high-level explanations from complex models is demonstrated on toy and mechanical systems.
arXiv Detail & Related papers (2023-11-30T15:46:22Z) - Measuring Causal Effects of Data Statistics on Language Model's
`Factual' Predictions [59.284907093349425]
Large amounts of training data are one of the major reasons for the high performance of state-of-the-art NLP models.
We provide a language for describing how training data influences predictions, through a causal framework.
Our framework bypasses the need to retrain expensive models and allows us to estimate causal effects based on observational data alone.
arXiv Detail & Related papers (2022-07-28T17:36:24Z) - Causal Dynamics Learning for Task-Independent State Abstraction [61.707048209272884]
We introduce Causal Dynamics Learning for Task-Independent State Abstraction (CDL)
CDL learns a theoretically proved causal dynamics model that removes unnecessary dependencies between state variables and the action.
A state abstraction can then be derived from the learned dynamics.
arXiv Detail & Related papers (2022-06-27T17:02:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.