Related papers: Fundamental Reasoning Paradigms Induce Out-of-Domain Generalization in Language Models

Fundamental Reasoning Paradigms Induce Out-of-Domain Generalization in Language Models

URL: http://arxiv.org/abs/2602.08658v2
Date: Tue, 10 Feb 2026 15:47:40 GMT
Title: Fundamental Reasoning Paradigms Induce Out-of-Domain Generalization in Language Models
Authors: Mingzi Cao, Xingwei Tan, Mahmud Elahi Akhter, Marco Valentino, Maria Liakata, Xi Wang, Nikolaos Aletras,
Abstract summary: In this study, we shed light on how the interplay between these core paradigms influences Large Language Model (LLM) reasoning.<n>We first collect a new dataset of reasoning trajectories from symbolic tasks, each targeting one of the three fundamental paradigms.<n>We then investigate effective ways for inducing these skills into LLMs.
Score: 43.76842321707181
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deduction, induction, and abduction are fundamental reasoning paradigms, core for human logical thinking. Although improving Large Language Model (LLM) reasoning has attracted significant research efforts, the extent to which the fundamental paradigms induce generalization has yet to be systematically explored. In this study, we shed light on how the interplay between these core paradigms influences LLMs' reasoning behavior. To this end, we first collect a new dataset of reasoning trajectories from symbolic tasks, each targeting one of the three fundamental paradigms, to abstract from concrete world knowledge. Then, we investigate effective ways for inducing these skills into LLMs. We experiment with a battery of methods including simple fine-tuning, and more complex approaches to increase model depth, or transform a dense model to a mixture-of-experts. We comprehensively evaluate induced models on realistic out-of-domain tasks, that are entirely formulated in natural language and contain real-world knowledge. Our results reveal that our approach yields strong generalizability with substantial performance gains (up to $14.60$) across realistic tasks.

Related papers

LTD-Bench: Evaluating Large Language Models by Letting Them Draw [57.237152905238084]
LTD-Bench is a breakthrough benchmark for large language models (LLMs)<n>It transforms LLM evaluation from abstract scores to directly observable visual outputs by requiring models to generate drawings through dot matrices or executable code.<n> LTD-Bench's visual outputs enable powerful diagnostic analysis, offering a potential approach to investigate model similarity.
arXiv Detail & Related papers (2025-11-04T08:11:23Z)
Step-Aware Policy Optimization for Reasoning in Diffusion Large Language Models [57.42778606399764]
Diffusion language models (dLLMs) offer a promising, non-autoregressive paradigm for text generation.<n>Current reinforcement learning approaches often rely on sparse, outcome-based rewards.<n>We argue that this stems from a fundamental mismatch with the natural structure of reasoning.
arXiv Detail & Related papers (2025-10-02T00:34:15Z)
From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models [66.36007274540113]
Multimodal Large Language Models (MLLMs) strive to achieve a profound, human-like understanding of and interaction with the physical world.<n>They often exhibit a shallow and incoherent integration when acquiring information (Perception) and conducting reasoning (Cognition)<n>This survey introduces a novel and unified analytical framework: From Perception to Cognition"
arXiv Detail & Related papers (2025-09-29T18:25:40Z)
LLM-Guided Semantic Relational Reasoning for Multimodal Intent Recognition [14.683883775425821]
This paper proposes a novel method for understanding human intents from multimodal signals.<n>The method harnesses the expansive knowledge of large language models (LLMs) to establish semantic foundations.<n>Experiments on multimodal intent and dialogue act tasks demonstrate LGSRR's superiority over state-of-the-art methods.
arXiv Detail & Related papers (2025-09-01T10:18:47Z)
A Survey on Post-training of Large Language Models [185.51013463503946]
Large Language Models (LLMs) have fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration.<n>These challenges necessitate advanced post-training language models (PoLMs) to address shortcomings, such as restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance.<n>This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms: Fine-tuning, which enhances task-specific accuracy; Alignment, which ensures ethical coherence and alignment with human preferences; Reasoning, which advances multi-step inference despite challenges in reward design; Integration and Adaptation, which
arXiv Detail & Related papers (2025-03-08T05:41:42Z)
LogiDynamics: Unraveling the Dynamics of Inductive, Abductive and Deductive Logical Inferences in LLM Reasoning [74.0242521818214]
This paper systematically investigates the comparative dynamics of inductive (System 1) versus abductive/deductive (System 2) inference in large language models (LLMs)<n>We utilize a controlled analogical reasoning environment, varying modality (textual, visual, symbolic), difficulty, and task format (MCQ / free-text)<n>Our analysis reveals System 2 pipelines generally excel, particularly in visual/symbolic modalities and harder tasks, while System 1 is competitive for textual and easier problems.
arXiv Detail & Related papers (2025-02-16T15:54:53Z)
An Analysis for Reasoning Bias of Language Models with Small Initialization [8.380004565348619]
Large Language Models (LLMs) have revolutionized Natural Language Processing by demonstrating exceptional performance across diverse tasks.<n>This study investigates the impact of the parameter initialization scale on the training behavior and task preferences of LLMs.
arXiv Detail & Related papers (2025-02-05T15:23:26Z)
Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning [25.732397636695882]
We show that large language models (LLMs) display reasoning patterns akin to those observed in humans. Our research demonstrates that the architecture and scale of the model significantly affect its preferred method of reasoning.
arXiv Detail & Related papers (2024-02-20T12:58:14Z)
From Understanding to Utilization: A Survey on Explainability for Large Language Models [27.295767173801426]
This survey underscores the imperative for increased explainability in Large Language Models (LLMs) Our focus is primarily on pre-trained Transformer-based LLMs, which pose distinctive interpretability challenges due to their scale and complexity. When considering the utilization of explainability, we explore several compelling methods that concentrate on model editing, control generation, and model enhancement.
arXiv Detail & Related papers (2024-01-23T16:09:53Z)
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems. LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning. We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z)
Re-Reading Improves Reasoning in Large Language Models [87.46256176508376]
We introduce a simple, yet general and effective prompting method, Re2, to enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs) Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), Re2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process. We evaluate Re2 on extensive reasoning benchmarks across 14 datasets, spanning 112 experiments, to validate its effectiveness and generality.
arXiv Detail & Related papers (2023-09-12T14:36:23Z)
Post Hoc Explanations of Language Models Can Improve Language Models [43.2109029463221]
We present a novel framework, Amplifying Model Performance by Leveraging In-Context Learning with Post Hoc Explanations (AMPLIFY) We leverage post hoc explanation methods which output attribution scores (explanations) capturing the influence of each of the input features on model predictions. Our framework, AMPLIFY, leads to prediction accuracy improvements of about 10-25% over a wide range of tasks.
arXiv Detail & Related papers (2023-05-19T04:46:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.