A Mechanistic Interpretation of Syllogistic Reasoning in Auto-Regressive Language Models
- URL: http://arxiv.org/abs/2408.08590v2
- Date: Mon, 17 Feb 2025 12:09:50 GMT
- Title: A Mechanistic Interpretation of Syllogistic Reasoning in Auto-Regressive Language Models
- Authors: Geonhee Kim, Marco Valentino, André Freitas,
- Abstract summary: Recent studies on logical reasoning in Language Models (LMs) have sparked a debate on whether LMs can learn systematic reasoning principles during pre-training.
This paper presents a mechanistic interpretation of syllogistic reasoning in LMs to advance the understanding of internal dynamics.
- Score: 13.59675117792588
- License:
- Abstract: Recent studies on logical reasoning in Language Models (LMs) have sparked a debate on whether they can learn systematic reasoning principles during pre-training or merely exploit superficial patterns in the training data. This paper presents a mechanistic interpretation of syllogistic reasoning in LMs to advance the understanding of internal dynamics. Specifically, we present a methodology for circuit discovery aimed at interpreting content-independent reasoning mechanisms. Through two distinct intervention methods, we uncover a sufficient and necessary circuit involving middle-term suppression that elucidates how LMs transfer information to derive valid conclusions from premises. Furthermore, we investigate how belief biases manifest in syllogistic reasoning, finding evidence of partial contamination from additional attention heads responsible for encoding commonsense and contextualized knowledge. Finally, we explore the generalization of the discovered mechanisms across various syllogistic schemes, model sizes and architectures, finding that the identified circuit is sufficient and necessary for the schemes on which the models achieve high downstream accuracy (> 60%), and that the activation patterns apply to models of different families. Overall, our findings suggest that LMs indeed learn transferable content-independent reasoning mechanisms, but that, at the same time, such mechanisms do not involve generalizable and abstract logical primitives, being susceptible to contamination by the same world knowledge acquired during pre-training.
Related papers
- LogiDynamics: Unraveling the Dynamics of Logical Inference in Large Language Model Reasoning [49.58786377307728]
This paper adopts an exploratory approach by introducing a controlled evaluation environment for analogical reasoning.
We analyze the comparative dynamics of inductive, abductive, and deductive inference pipelines.
We investigate advanced paradigms such as hypothesis selection, verification, and refinement, revealing their potential to scale up logical inference.
arXiv Detail & Related papers (2025-02-16T15:54:53Z) - Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning [9.795934690403374]
It is still unclear which multi-step reasoning mechanisms are used by language models to solve such tasks.
We employ circuit analysis and self-influence functions to evaluate the changing importance of each token throughout the reasoning process.
We demonstrate that the underlying circuits reveal a human-interpretable reasoning process used by the model.
arXiv Detail & Related papers (2025-02-13T07:19:05Z) - A Systematic Analysis of Large Language Models as Soft Reasoners: The Case of Syllogistic Inferences [5.141416267381492]
We consider the case of syllogistic reasoning, an area of deductive reasoning studied extensively in logic and cognitive psychology.
We investigate the effects of chain-of-thought reasoning, in-context learning, and supervised fine-tuning on syllogistic reasoning.
Our results suggest that the behavior of pre-trained LLMs can be explained by cognitive science.
arXiv Detail & Related papers (2024-06-17T08:59:04Z) - A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning [73.77088902676306]
We take a closer look at the self-verification abilities of large language models (LLMs) in the context of logical reasoning.
Our main findings suggest that existing LLMs could struggle to identify fallacious reasoning steps accurately and may fall short of guaranteeing the validity of self-verification methods.
arXiv Detail & Related papers (2023-11-14T07:13:10Z) - Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems.
LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning.
We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z) - Explainability for Large Language Models: A Survey [59.67574757137078]
Large language models (LLMs) have demonstrated impressive capabilities in natural language processing.
This paper introduces a taxonomy of explainability techniques and provides a structured overview of methods for explaining Transformer-based language models.
arXiv Detail & Related papers (2023-09-02T22:14:26Z) - Modeling Hierarchical Reasoning Chains by Linking Discourse Units and
Key Phrases for Reading Comprehension [80.99865844249106]
We propose a holistic graph network (HGN) which deals with context at both discourse level and word level, as the basis for logical reasoning.
Specifically, node-level and type-level relations, which can be interpreted as bridges in the reasoning process, are modeled by a hierarchical interaction mechanism.
arXiv Detail & Related papers (2023-06-21T07:34:27Z) - Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability [30.76910454663951]
Causal abstraction provides a theoretical foundation for mechanistic interpretability.
Our contributions are generalizing the theory of causal abstraction from mechanism replacement to arbitrary mechanism transformation.
arXiv Detail & Related papers (2023-01-11T20:42:41Z) - Abduction and Argumentation for Explainable Machine Learning: A Position
Survey [2.28438857884398]
This paper presents Abduction and Argumentation as two principled forms for reasoning.
It fleshes out the fundamental role that they can play within Machine Learning.
arXiv Detail & Related papers (2020-10-24T13:23:44Z) - Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason
Over Implicit Knowledge [96.92252296244233]
Large pre-trained language models (LMs) acquire some reasoning capacity, but this ability is difficult to control.
We show that LMs can be trained to reliably perform systematic reasoning combining both implicit, pre-trained knowledge and explicit natural language statements.
Our work paves a path towards open-domain systems that constantly improve by interacting with users who can instantly correct a model by adding simple natural language statements.
arXiv Detail & Related papers (2020-06-11T17:02:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.