Related papers: CausalARC: Abstract Reasoning with Causal World Models

CausalARC: Abstract Reasoning with Causal World Models

URL: http://arxiv.org/abs/2509.03636v2
Date: Sat, 01 Nov 2025 23:22:34 GMT
Title: CausalARC: Abstract Reasoning with Causal World Models
Authors: Jacqueline Maasch, John Kalantari, Kia Khezeli,
Abstract summary: CausalARC is an experimental testbed for AI reasoning in low-data and out-of-distribution regimes.<n>Each CausalARC reasoning task is sampled from a fully specified causal world model.<n>Within- and between-model performance varied heavily across tasks, indicating room for significant improvement in language model reasoning.
Score: 0.8793721044482612
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: On-the-fly reasoning often requires adaptation to novel problems under limited data and distribution shift. This work introduces CausalARC: an experimental testbed for AI reasoning in low-data and out-of-distribution regimes, modeled after the Abstraction and Reasoning Corpus (ARC). Each CausalARC reasoning task is sampled from a fully specified causal world model, formally expressed as a structural causal model. Principled data augmentations provide observational, interventional, and counterfactual feedback about the world model in the form of few-shot, in-context learning demonstrations. As a proof-of-concept, we illustrate the use of CausalARC for four language model evaluation settings: (1) abstract reasoning with test-time training, (2) counterfactual reasoning with in-context learning, (3) program synthesis, and (4) causal discovery with logical reasoning. Within- and between-model performance varied heavily across tasks, indicating room for significant improvement in language model reasoning.

Related papers

Schoenfeld's Anatomy of Mathematical Reasoning by Language Models [56.656180566692946]
We adopt Schoenfeld's Episode Theory as an inductive, intermediate-scale lens and introduce ThinkARM (Anatomy of Reasoning in Models)<n>ThinkARM explicitly abstracts reasoning traces into functional reasoning steps such as Analysis, Explore, Implement, verify, etc.<n>We show that episode-level representations make reasoning steps explicit, enabling systematic analysis of how reasoning is structured, stabilized, and altered in modern language models.
arXiv Detail & Related papers (2025-12-23T02:44:25Z)
ARCHE: A Novel Task to Evaluate LLMs on Latent Reasoning Chain Extraction [70.53044880892196]
We introduce a novel task named Latent Reasoning Chain Extraction (ARCHE), in which models must decompose complex reasoning arguments into combinations of standard reasoning paradigms in the form of a Reasoning Logic Tree (RLT)<n>To facilitate this task, we release ARCHE Bench, a new benchmark derived from 70 Nature Communications articles, including more than 1,900 references and 38,000 viewpoints.<n> Evaluations on 10 leading LLMs on ARCHE Bench reveal that models exhibit a trade-off between REA and EC, and none are yet able to extract a complete and standard reasoning chain.
arXiv Detail & Related papers (2025-11-16T07:37:09Z)
Causal Distillation: Transferring Structured Explanations from Large to Compact Language Models [0.0]
Large proprietary language models exhibit strong causal reasoning abilities that smaller open-source models struggle to replicate.<n>We introduce a novel framework for distilling causal explanations that transfers causal reasoning skills from a powerful teacher model to a compact open-source model.<n>The key idea is to train the smaller model to develop causal reasoning abilities by generating structured cause-and-effect explanations consistent with those of the teacher model.
arXiv Detail & Related papers (2025-05-26T04:50:42Z)
Evaluating the Logical Reasoning Abilities of Large Reasoning Models [15.009205651973666]
We introduce LogiEval, a benchmark for evaluating logical reasoning in large reasoning models.<n>LogiEval spans diverse reasoning types (deductive, inductive, analogical, and abductive) and task formats (e.g., logical sequence, argument analysis)<n>Our experiments demonstrate that modern reasoning models excel at 4-choice argument analysis problems and analogical reasoning, surpassing human performance.<n>Our analysis reveals that human performance does not mirror model failure distributions.
arXiv Detail & Related papers (2025-05-17T05:36:14Z)
Failure Modes of LLMs for Causal Reasoning on Narratives [51.19592551510628]
We investigate the interaction between world knowledge and logical reasoning.<n>We find that state-of-the-art large language models (LLMs) often rely on superficial generalizations.<n>We show that simple reformulations of the task can elicit more robust reasoning behavior.
arXiv Detail & Related papers (2024-10-31T12:48:58Z)
Targeted Reduction of Causal Models [55.11778726095353]
Causal Representation Learning offers a promising avenue to uncover interpretable causal patterns in simulations. We introduce Targeted Causal Reduction (TCR), a method for condensing complex intervenable models into a concise set of causal factors. Its ability to generate interpretable high-level explanations from complex models is demonstrated on toy and mechanical systems.
arXiv Detail & Related papers (2023-11-30T15:46:22Z)
Inducing Causal Structure for Abstractive Text Summarization [76.1000380429553]
We introduce a Structural Causal Model (SCM) to induce the underlying causal structure of the summarization data. We propose a Causality Inspired Sequence-to-Sequence model (CI-Seq2Seq) to learn the causal representations that can mimic the causal factors. Experimental results on two widely used text summarization datasets demonstrate the advantages of our approach.
arXiv Detail & Related papers (2023-08-24T16:06:36Z)
Learning a Structural Causal Model for Intuition Reasoning in Conversation [20.243323155177766]
Reasoning, a crucial aspect of NLP research, has not been adequately addressed by prevailing models. We develop a conversation cognitive model ( CCM) that explains how each utterance receives and activates channels of information. By leveraging variational inference, it explores substitutes for implicit causes, addresses the issue of their unobservability, and reconstructs the causal representations of utterances through the evidence lower bounds.
arXiv Detail & Related papers (2023-05-28T13:54:09Z)
Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability [30.76910454663951]
Causal abstraction provides a theoretical foundation for mechanistic interpretability.<n>Our contributions are generalizing the theory of causal abstraction from mechanism replacement to arbitrary mechanism transformation.
arXiv Detail & Related papers (2023-01-11T20:42:41Z)
MetaLogic: Logical Reasoning Explanations with Fine-Grained Structure [129.8481568648651]
We propose a benchmark to investigate models' logical reasoning capabilities in complex real-life scenarios. Based on the multi-hop chain of reasoning, the explanation form includes three main components. We evaluate the current best models' performance on this new explanation form.
arXiv Detail & Related papers (2022-10-22T16:01:13Z)
Systematic Evaluation of Causal Discovery in Visual Model Based Reinforcement Learning [76.00395335702572]
A central goal for AI and causality is the joint discovery of abstract representations and causal structure. Existing environments for studying causal induction are poorly suited for this objective because they have complicated task-specific causal graphs. In this work, our goal is to facilitate research in learning representations of high-level variables as well as causal structures among them.
arXiv Detail & Related papers (2021-07-02T05:44:56Z)
Towards Interpretable Reasoning over Paragraph Effects in Situation [126.65672196760345]
We focus on the task of reasoning over paragraph effects in situation, which requires a model to understand the cause and effect. We propose a sequential approach for this task which explicitly models each step of the reasoning process with neural network modules. In particular, five reasoning modules are designed and learned in an end-to-end manner, which leads to a more interpretable model.
arXiv Detail & Related papers (2020-10-03T04:03:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.