Reasoning Beyond Limits: Advances and Open Problems for LLMs
- URL: http://arxiv.org/abs/2503.22732v1
- Date: Wed, 26 Mar 2025 12:29:40 GMT
- Title: Reasoning Beyond Limits: Advances and Open Problems for LLMs
- Authors: Mohamed Amine Ferrag, Norbert Tihanyi, Merouane Debbah,
- Abstract summary: Recent generative reasoning breakthroughs have transformed how large language models (LLMs) tackle complex problems.<n>We provide a comprehensive analysis of the top 27 LLM models released between 2023 and 2025.<n>We discuss the key challenges in advancing LLM capabilities, including improving multi-step reasoning without human supervision.
- Score: 1.4929298667651645
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent generative reasoning breakthroughs have transformed how large language models (LLMs) tackle complex problems by dynamically retrieving and refining information while generating coherent, multi-step thought processes. Techniques such as inference-time scaling, reinforcement learning, supervised fine-tuning, and distillation have been successfully applied to models like DeepSeek-R1, OpenAI's o1 & o3, GPT-4o, Qwen-32B, and various Llama variants, resulting in enhanced reasoning capabilities. In this paper, we provide a comprehensive analysis of the top 27 LLM models released between 2023 and 2025 (including models such as Mistral AI Small 3 24B, DeepSeek-R1, Search-o1, QwQ-32B, and phi-4). Then, we present an extensive overview of training methodologies that spans general training approaches, mixture-of-experts (MoE) and architectural innovations, retrieval-augmented generation (RAG), chain-of-thought and self-improvement techniques, as well as test-time compute scaling, distillation, and reinforcement learning (RL) methods. Finally, we discuss the key challenges in advancing LLM capabilities, including improving multi-step reasoning without human supervision, overcoming limitations in chained tasks, balancing structured prompts with flexibility, and enhancing long-context retrieval and external tool integration.
Related papers
- Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities [62.05713042908654]
This paper provides a review of advances in Large Language Models (LLMs) alignment through the lens of inverse reinforcement learning (IRL)<n>We highlight the necessity of constructing neural reward models from human data and discuss the formal and practical implications of this paradigm shift.
arXiv Detail & Related papers (2025-07-17T14:22:24Z) - Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training [121.5858973157225]
We investigate the effects of prolonged reinforcement learning on a small language model across a diverse set of reasoning domains.<n>We introduce controlled KL regularization, clipping ratio, and periodic reference policy resets as critical components for unlocking long-term performance gains.<n>Our model achieves significant improvements over strong baselines, including +14.7% on math, +13.9% on coding, and +54.8% on logic puzzle tasks.
arXiv Detail & Related papers (2025-07-16T17:59:24Z) - ASTRO: Teaching Language Models to Reason by Reflecting and Backtracking In-Context [66.15505423059234]
We introduce ASTRO, a framework for training language models to reason like search algorithms.<n>We apply ASTRO to the Llama 3 family of models and achieve absolute performance gains of 16.4% on MATH-500, 26.9% on AMC 2023, and 20.0% on AIME 2024.
arXiv Detail & Related papers (2025-07-01T04:10:15Z) - RAG-R1 : Incentivize the Search and Reasoning Capabilities of LLMs through Multi-query Parallelism [10.288667305064065]
Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks.<n>LLMs remain prone to generating hallucinated or outdated responses due to their static internal knowledge.<n>Recent advancements in Retrieval-Augmented Generation (RAG) methods have aimed to enhance models' search and reasoning capabilities.
arXiv Detail & Related papers (2025-06-30T09:02:45Z) - A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law [29.763080554625216]
This survey explores recent advancements in reasoning large language models (LLMs) designed to mimic "slow thinking"<n>LLMs focus on scaling computational resources dynamically during complex tasks, such as math reasoning, visual reasoning, medical diagnosis, and multi-agent debates.
arXiv Detail & Related papers (2025-05-05T14:14:59Z) - A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems [93.8285345915925]
Reasoning is a fundamental cognitive process that enables logical inference, problem-solving, and decision-making.
With the rapid advancement of large language models (LLMs), reasoning has emerged as a key capability that distinguishes advanced AI systems.
We categorize existing methods along two dimensions: (1) Regimes, which define the stage at which reasoning is achieved; and (2) Architectures, which determine the components involved in the reasoning process.
arXiv Detail & Related papers (2025-04-12T01:27:49Z) - OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement [91.88062410741833]
This study investigates whether similar reasoning capabilities can be successfully integrated into large vision-language models (LVLMs)<n>We consider an approach that iteratively leverages supervised fine-tuning (SFT) on lightweight training data and Reinforcement Learning (RL) to further improve model generalization.<n>OpenVLThinker, a LVLM exhibiting consistently improved reasoning performance on challenging benchmarks such as MathVista, MathVerse, and MathVision, demonstrates the potential of our strategy for robust vision-language reasoning.
arXiv Detail & Related papers (2025-03-21T17:52:43Z) - Evaluating Mathematical Reasoning Across Large Language Models: A Fine-Grained Approach [15.960271016276447]
We present a systematic evaluation of mathematical reasoning abilities across eight leading Large Language Models (LLMs)<n>Our analyses reveal several key findings: DeepSeek-R1 performs competitively with o1 across most domains and achieves the highest accuracy on the MMLU Formal Logic benchmark.<n>We explore how architectural choices, training paradigms, and optimization strategies contribute to variation in reasoning performance.
arXiv Detail & Related papers (2025-03-13T17:23:45Z) - R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning [87.30285670315334]
textbfR1-Searcher is a novel two-stage outcome-based RL approach designed to enhance the search capabilities of Large Language Models.<n>Our framework relies exclusively on RL, without requiring process rewards or distillation for a cold start.<n>Our experiments demonstrate that our method significantly outperforms previous strong RAG methods, even when compared to the closed-source GPT-4o-mini.
arXiv Detail & Related papers (2025-03-07T17:14:44Z) - LLM Post-Training: A Deep Dive into Reasoning Large Language Models [131.10969986056]
Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications.
Post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations.
arXiv Detail & Related papers (2025-02-28T18:59:54Z) - Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [57.28671084993782]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains.<n>Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities.<n>We propose a two-stage training paradigm: 1) a small-scale format tuning stage to internalize the COAT reasoning format and 2) a large-scale self-improvement stage leveraging reinforcement learning.
arXiv Detail & Related papers (2025-02-04T17:26:58Z) - Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models [33.13238566815798]
Large Language Models (LLMs) have sparked significant research interest in leveraging them to tackle complex reasoning tasks.<n>Recent studies demonstrate that encouraging LLMs to "think" with more tokens during test-time inference can significantly boost reasoning accuracy.<n>The introduction of OpenAI's o1 series marks a significant milestone in this research direction.
arXiv Detail & Related papers (2025-01-16T17:37:58Z) - Diving into Self-Evolving Training for Multimodal Reasoning [36.70979791148913]
Self-evolving trainin has emerged as a key approach for complex reasoning tasks.<n>This paper reframes self-evolving training for multimodal reasoning through the lens of reinforcement learning.<n>We propose M-STAR, a framework that achieves consistent performance gains across models of varying sizes and diverse benchmarks.
arXiv Detail & Related papers (2024-12-23T10:18:41Z) - A Looming Replication Crisis in Evaluating Behavior in Language Models? Evidence and Solutions [15.350973327319418]
Large language models (LLMs) are increasingly integrated into a wide range of everyday applications.
This raises concerns about the replicability and generalizability of insights gained from research on LLM behavior.
We tested GPT-3.5, GPT-4o, Gemini 1.5 Pro, Claude 3 Opus, Llama 3-8B, and Llama 3-70B, on the chain-of-thought, EmotionPrompting, ExpertPrompting, Sandbagging, as well as Re-Reading prompt engineering techniques.
arXiv Detail & Related papers (2024-09-30T14:00:34Z) - Recursive Introspection: Teaching Language Model Agents How to Self-Improve [30.086494067593268]
We develop RISE: Recursive IntroSpEction, an approach for fine-tuning large language models.
Our experiments show that RISE enables Llama2, Llama3, and Mistral models to improve themselves with more turns on math reasoning tasks.
arXiv Detail & Related papers (2024-07-25T17:35:59Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.<n>We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.<n>Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.