Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization
- URL: http://arxiv.org/abs/2510.06274v1
- Date: Mon, 06 Oct 2025 13:08:31 GMT
- Title: Bridging Reasoning to Learning: Unmasking Illusions using Complexity Out of Distribution Generalization
- Authors: Mohammad Mahdi Samiei Paqaleh, Arash Marioriyad, Arman Tahmasebi-Zadeh, Mohamadreza Fereydooni, Mahdi Ghaznavai, Mahdieh Soleymani Baghshah,
- Abstract summary: We propose Complexity Out of Distribution (Complexity OoD) generalization as a framework to define and measure reasoning.<n>A model exhibits Complexity OoD generalization when it maintains performance on test instances whose minimal required solution complexity exceeds that of all training examples.<n>We translate this perspective into practice with recommendations for operationalizing Complexity OoD across the stack.
- Score: 8.236500918322138
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent progress has pushed AI frontiers from pattern recognition tasks toward problems that require step by step, System2 style reasoning, especially with large language models. Yet, unlike learning, where generalization and out of distribution (OoD) evaluation concepts are well formalized, there is no clear, consistent definition or metric for reasoning ability. We propose Complexity Out of Distribution (Complexity OoD) generalization as a framework and problem setting to define and measure reasoning. A model exhibits Complexity OoD generalization when it maintains performance on test instances whose minimal required solution complexity, either representational (richer solution structure) or computational (more reasoning steps/program length), exceeds that of all training examples. We formalize complexity via solution description Kolmogorov complexity and operational proxies (e.g., object/relation counts; reasoning step counts), clarifying how Complexity OoD differs from length and compositional OoD. This lens unifies learning and reasoning: many cases solvable with System1 like processing at low complexity become System2 like under complexity pressure, while System2 can be viewed as generalization over solution structures. We translate this perspective into practice with recommendations for operationalizing Complexity OoD across the stack: incorporating complexity into benchmark and evaluation metric design, rethinking supervision to target solution traces, seeking and designing inductive biases for Complexity OoD generalization, addressing learning to reason spillovers such as spurious shortcuts, semantic robustness, catastrophic forgetting, and step wise calibration. Because Complexity OoD cannot be solved by scaling data alone, progress toward robust reasoning will require architectures and training regimes that explicitly model and allocate computation with respect to complexity.
Related papers
- Not All Code Is Equal: A Data-Centric Study of Code Complexity and LLM Reasoning [16.919028520729793]
Large Language Models (LLMs) increasingly exhibit strong reasoning abilities, often attributed to their capacity to generate chain-of-thought-style intermediate reasoning.<n>Recent work suggests that exposure to code can further enhance these skills, but existing studies largely treat code as a generic training signal.<n>We study the structural complexity of code, which captures control flow and compositional structure that may shape how models internalise multi-step reasoning during fine-tuning.
arXiv Detail & Related papers (2026-01-29T15:54:40Z) - CoT-Seg: Rethinking Segmentation with Chain-of-Thought Reasoning and Self-Correction [50.67483317563736]
This paper aims to explore a system that can think step-by-step, look up information if needed, generate results, self-evaluate its own results, and refine the results.<n>We introduce CoT-Seg, a training-free framework that rethinks reasoning segmentation by combining chain-of-thought reasoning with self-correction.
arXiv Detail & Related papers (2026-01-24T11:41:54Z) - Unlocking Symbol-Level Precoding Efficiency Through Tensor Equivariant Neural Network [84.22115118596741]
We propose an end-to-end deep learning (DL) framework with low inference complexity for symbol-level precoding.<n>We show that the proposed framework captures substantial performance gains of optimal SLP, while achieving an approximately 80-times speedup over conventional methods.
arXiv Detail & Related papers (2025-10-02T15:15:50Z) - A Quantum Computational Perspective on Spread Complexity [0.0]
We establish a direct connection between spread complexity and quantum circuit complexity by demonstrating that spread complexity emerges as a limiting case of a circuit complexity framework built from two fundamental operations: time-evolution and superposition.<n>Our approach leverages a computational setup where unitary gates and beam-splitting operations generate target states, with the minimal cost of synthesis yielding a complexity measure that converges to spread complexity in the infinitesimal time-evolution limit.
arXiv Detail & Related papers (2025-06-08T19:04:42Z) - PixelThink: Towards Efficient Chain-of-Pixel Reasoning [70.32510083790069]
PixelThink is a simple yet effective scheme that integrates externally estimated task difficulty and internally measured model uncertainty.<n>It learns to compress reasoning length in accordance with scene complexity and predictive confidence.<n> Experimental results demonstrate that the proposed approach improves both reasoning efficiency and overall segmentation performance.
arXiv Detail & Related papers (2025-05-29T17:55:49Z) - FOL-Pretrain: A complexity annotated corpus of first-order logic [16.061040115094592]
Transformer-based large language models (LLMs) have demonstrated remarkable reasoning capabilities.<n>Despite recent efforts to reverse-engineer LLM behavior, our understanding of how these models internalize and execute complex algorithms remains limited.<n>We introduce a large-scale, fully open, complexity-annotated dataset of first-order logic reasoning traces.
arXiv Detail & Related papers (2025-05-20T21:38:28Z) - Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution [59.39066657300045]
Chain-of-Thought (CoT) prompting enhances the reasoning of large language models (LLMs) by decomposing problems into sequential steps.<n>We propose Syzygy of Thoughts (SoT)-a novel framework that extends CoT by introducing auxiliary, interrelated reasoning paths.<n>SoT captures deeper logical dependencies, enabling more robust and structured problem-solving.
arXiv Detail & Related papers (2025-04-13T13:35:41Z) - Unveiling Hybrid Cyclomatic Complexity: A Comprehensive Analysis and Evaluation as an Integral Feature in Automatic Defect Prediction Models [0.5461938536945723]
This paper aims to analyse a novel complexity metric, Hybrid Cyclomatic Complexity (HCC) and its efficiency as a feature in a defect prediction model.<n>We will present a comparative study between the HCC metric and its two components, the inherited complexity and the actual complexity of a class in the object-oriented context.
arXiv Detail & Related papers (2025-04-01T07:07:17Z) - Epistemic Logic Programs: Non-Ground and Counting Complexity [32.575043686973224]
Epistemic logic programs (ELP) extend ASP to reason about all or some answer sets.<n>This paper establishes the complexity of non-ground ELPs.
arXiv Detail & Related papers (2025-01-31T20:08:52Z) - When Do Program-of-Thoughts Work for Reasoning? [51.2699797837818]
We propose complexity-impacted reasoning score (CIRS) to measure correlation between code and reasoning abilities.
Specifically, we use the abstract syntax tree to encode the structural information and calculate logical complexity.
Code will be integrated into the EasyInstruct framework at https://github.com/zjunlp/EasyInstruct.
arXiv Detail & Related papers (2023-08-29T17:22:39Z) - Successive Prompting for Decomposing Complex Questions [50.00659445976735]
Recent works leverage the capabilities of large language models (LMs) to perform complex question answering in a few-shot setting.
We introduce Successive Prompting'', where we iteratively break down a complex task into a simple task, solve it, and then repeat the process until we get the final solution.
Our best model (with successive prompting) achieves an improvement of 5% absolute F1 on a few-shot version of the DROP dataset.
arXiv Detail & Related papers (2022-12-08T06:03:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.