Related papers: Reasoning Effort and Problem Complexity: A Scaling Analysis in LLMs

Reasoning Effort and Problem Complexity: A Scaling Analysis in LLMs

URL: http://arxiv.org/abs/2503.15113v1
Date: Wed, 19 Mar 2025 11:13:51 GMT
Title: Reasoning Effort and Problem Complexity: A Scaling Analysis in LLMs
Authors: Benjamin Estermann, Roger Wattenhofer,
Abstract summary: We investigate how the reasoning effort of large language models scales with problem complexity.<n>Our results show that reasoning effort scales with problem size, but only up to a critical problem complexity.
Score: 26.494798719138526
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) have demonstrated remarkable text generation capabilities, and recent advances in training paradigms have led to breakthroughs in their reasoning performance. In this work, we investigate how the reasoning effort of such models scales with problem complexity. We use the infinitely scalable Tents puzzle, which has a known linear-time solution, to analyze this scaling behavior. Our results show that reasoning effort scales with problem size, but only up to a critical problem complexity. Beyond this threshold, the reasoning effort does not continue to increase, and may even decrease. This observation highlights a critical limitation in the logical coherence of current LLMs as problem complexity increases, and underscores the need for strategies to improve reasoning scalability. Furthermore, our results reveal significant performance differences between current state-of-the-art reasoning models when faced with increasingly complex logical puzzles.

Related papers

Frontier LLMs Still Struggle with Simple Reasoning Tasks [53.497499123166804]
This work studies the performance of frontier language models on a broad set of "easy" reasoning problems.<n>We create a suite of procedurally generated simple reasoning tasks, including counting, first-order logic, proof trees, and travel planning.<n>We show that even state-of-the-art thinking models consistently fail on such problems and for similar reasons.
arXiv Detail & Related papers (2025-07-09T22:22:49Z)
The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity [16.266145641151375]
Large Reasoning Models generate detailed thinking processes before providing answers.<n>We show that LRMs face a complete accuracy collapse beyond certain complexities.<n>We also investigate the reasoning traces in more depth, studying the patterns of explored solutions.
arXiv Detail & Related papers (2025-06-07T22:42:29Z)
LogicPuzzleRL: Cultivating Robust Mathematical Reasoning in LLMs via Reinforcement Learning [29.047063129464494]
Large language models (LLMs) excel at many supervised tasks but often struggle with structured reasoning unfamiliar settings.<n>This discrepancy suggests that standard fine-tuning pipelines may instill narrow, domain-specifics rather than fostering general-purpose thinking strategies.<n>We propose a "play to learn" framework that fine-tunes LLMs through reinforcement learning on a suite of seven custom logic puzzles.
arXiv Detail & Related papers (2025-06-05T09:40:47Z)
Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT? [59.418994222096885]
We conduct a detailed analysis of model performance on the AIME24 dataset. We categorize questions into four tiers (Easy, Medium, Hard, and Extremely Hard) We find that progression from Easy to Medium tier requires adopting an R1 reasoning style with minimal SFT-1K instances. Exh-level questions present a fundamentally different challenge; they require unconventional problem-solving skills.
arXiv Detail & Related papers (2025-04-16T03:39:38Z)
A Survey of Scaling in Large Language Model Reasoning [62.92861523305361]
We provide a comprehensive examination of scaling in large Language models (LLMs) reasoning. We analyze scaling in reasoning steps that improves multi-step inference and logical consistency. We discuss scaling in training-enabled reasoning, focusing on optimization through iterative model improvement.
arXiv Detail & Related papers (2025-04-02T23:51:27Z)
From Chaos to Order: The Atomic Reasoner Framework for Fine-grained Reasoning in Large Language Models [46.02816479205161]
We present textbfAtomic Reasoner (textbfAR), a cognitive inference strategy that enables fine-grained reasoning. AR decomposes the reasoning process into atomic cognitive units, employing a cognitive routing mechanism. Results show AR's superior reasoning capabilities without the computational burden of exhaustive solution searches.
arXiv Detail & Related papers (2025-03-20T08:34:53Z)
When More is Less: Understanding Chain-of-Thought Length in LLMs [53.77747102201451]
Chain-of-thought (CoT) reasoning enhances the multi-step reasoning capabilities of large language models (LLMs)<n>However, for most models and tasks, does an increase in CoT length consistently lead to improved reasoning accuracy?<n>In this paper, we observe a nuanced relationship: as the number of reasoning steps increases, performance initially improves but eventually decreases.
arXiv Detail & Related papers (2025-02-11T05:28:59Z)
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning [92.76959707441954]
We introduce ZebraLogic, a comprehensive evaluation framework for assessing LLM reasoning performance.<n>ZebraLogic enables the generation of puzzles with controllable and quantifiable complexity.<n>Our results reveal a significant decline in accuracy as problem complexity grows.
arXiv Detail & Related papers (2025-02-03T06:44:49Z)
Causal Graphs Meet Thoughts: Enhancing Complex Reasoning in Graph-Augmented LLMs [4.701165676405066]
It is critical not only to retrieve relevant information but also to provide causal reasoning and explainability.<n>This paper proposes a novel pipeline that filters large knowledge graphs to emphasize cause-effect edges.<n> Experiments on medical question-answering tasks show consistent gains, with up to a 10% absolute improvement.
arXiv Detail & Related papers (2025-01-24T19:31:06Z)
On Memorization of Large Language Models in Logical Reasoning [70.94164038947078]
Large language models (LLMs) achieve good performance on challenging reasoning benchmarks, yet could also make basic reasoning mistakes.<n>One hypothesis is that the increasingly high and nearly saturated performance could be due to the memorization of similar problems.<n>We show that fine-tuning leads to heavy memorization, but it also consistently improves generalization performance.
arXiv Detail & Related papers (2024-10-30T15:31:54Z)
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models [84.15513004135576]
Current research enhances the reasoning performance of Large Language Models (LLMs) by sampling multiple reasoning chains and ensembling based on the answer frequency. This approach fails in scenarios where the correct answers are in the minority. We introduce a hierarchical reasoning aggregation framework AoR, which selects answers based on the evaluation of reasoning chains.
arXiv Detail & Related papers (2024-05-21T17:12:19Z)
Faith and Fate: Limits of Transformers on Compositionality [109.79516190693415]
We investigate the limits of transformer large language models across three representative compositional tasks. These tasks require breaking problems down into sub-steps and synthesizing these steps into a precise answer. Our empirical findings suggest that transformer LLMs solve compositional tasks by reducing multi-step compositional reasoning into linearized subgraph matching.
arXiv Detail & Related papers (2023-05-29T23:24:14Z)
Pushing the Limits of Rule Reasoning in Transformers through Natural Language Satisfiability [30.01308882849197]
We propose a new methodology for creating challenging algorithmic reasoning datasets. Key idea is to draw insights from empirical sampling of hard propositional SAT problems and from complexity-theoretic studies of language. We find that current transformers, given sufficient training data, are surprisingly robust at solving the resulting NLSat problems.
arXiv Detail & Related papers (2021-12-16T17:47:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.