Related papers: Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models

URL: http://arxiv.org/abs/2501.09686v3
Date: Thu, 23 Jan 2025 08:44:44 GMT
Title: Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Authors: Fengli Xu, Qianyue Hao, Zefang Zong, Jingwei Wang, Yunke Zhang, Jingyi Wang, Xiaochong Lan, Jiahui Gong, Tianjian Ouyang, Fanjin Meng, Chenyang Shao, Yuwei Yan, Qinglong Yang, Yiwen Song, Sijian Ren, Xinyuan Hu, Yu Li, Jie Feng, Chen Gao, Yong Li,
Abstract summary: Large Language Models (LLMs) have sparked significant research interest in leveraging them to tackle complex reasoning tasks.<n>Recent studies demonstrate that encouraging LLMs to "think" with more tokens during test-time inference can significantly boost reasoning accuracy.<n>The introduction of OpenAI's o1 series marks a significant milestone in this research direction.
Score: 33.13238566815798
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language has long been conceived as an essential tool for human reasoning. The breakthrough of Large Language Models (LLMs) has sparked significant research interest in leveraging these models to tackle complex reasoning tasks. Researchers have moved beyond simple autoregressive token generation by introducing the concept of "thought" -- a sequence of tokens representing intermediate steps in the reasoning process. This innovative paradigm enables LLMs' to mimic complex human reasoning processes, such as tree search and reflective thinking. Recently, an emerging trend of learning to reason has applied reinforcement learning (RL) to train LLMs to master reasoning processes. This approach enables the automatic generation of high-quality reasoning trajectories through trial-and-error search algorithms, significantly expanding LLMs' reasoning capacity by providing substantially more training data. Furthermore, recent studies demonstrate that encouraging LLMs to "think" with more tokens during test-time inference can further significantly boost reasoning accuracy. Therefore, the train-time and test-time scaling combined to show a new research frontier -- a path toward Large Reasoning Model. The introduction of OpenAI's o1 series marks a significant milestone in this research direction. In this survey, we present a comprehensive review of recent progress in LLM reasoning. We begin by introducing the foundational background of LLMs and then explore the key technical components driving the development of large reasoning models, with a focus on automated data construction, learning-to-reason techniques, and test-time scaling. We also analyze popular open-source projects at building large reasoning models, and conclude with open challenges and future research directions.

Related papers

Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities [62.05713042908654]
This paper provides a review of advances in Large Language Models (LLMs) alignment through the lens of inverse reinforcement learning (IRL)<n>We highlight the necessity of constructing neural reward models from human data and discuss the formal and practical implications of this paradigm shift.
arXiv Detail & Related papers (2025-07-17T14:22:24Z)
Large Language Models in Argument Mining: A Survey [15.041650203089057]
Argument Mining (AM) focuses on extracting argumentative structures from text.<n>The advent of Large Language Models (LLMs) has profoundly transformed AM, enabling advanced in-context learning.<n>This survey systematically synthesizes recent advancements in LLM-driven AM.
arXiv Detail & Related papers (2025-06-19T15:12:58Z)
A Survey on Large Language Models for Mathematical Reasoning [13.627895103752783]
This survey examines the development of mathematical reasoning abilities in large language models (LLMs)<n>We review methods for enhancing mathematical reasoning, ranging from training-free prompting to fine-tuning approaches such as supervised fine-tuning and reinforcement learning.<n>Despite notable progress, fundamental challenges remain in terms of capacity, efficiency, and generalization.
arXiv Detail & Related papers (2025-06-10T04:44:28Z)
Interleaved Reasoning for Large Language Models via Reinforcement Learning [22.403928213802036]
Long chain-of-thought (CoT) enhances large language models' (LLM) reasoning capabilities.<n>We propose a novel training paradigm that uses reinforcement learning (RL) to guide reasoning LLMs to interleave thinking and answering for multi-hop questions.
arXiv Detail & Related papers (2025-05-26T07:58:17Z)
A Survey of Scaling in Large Language Model Reasoning [62.92861523305361]
We provide a comprehensive examination of scaling in large Language models (LLMs) reasoning. We analyze scaling in reasoning steps that improves multi-step inference and logical consistency. We discuss scaling in training-enabled reasoning, focusing on optimization through iterative model improvement.
arXiv Detail & Related papers (2025-04-02T23:51:27Z)
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond [88.5807076505261]
Large Reasoning Models (LRMs) have demonstrated strong performance gains by scaling up the length of Chain-of-Thought (CoT) reasoning during inference. A growing concern lies in their tendency to produce excessively long reasoning traces. This inefficiency introduces significant challenges for training, inference, and real-world deployment.
arXiv Detail & Related papers (2025-03-27T15:36:30Z)
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models [54.04678363287392]
Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks. Recent advancements in OpenAI o1 and DeepSeek-R1 have further improved performance in System-2 reasoning domains.
arXiv Detail & Related papers (2025-03-20T17:59:38Z)
Advancing Reasoning in Large Language Models: Promising Methods and Approaches [0.0]
Large Language Models (LLMs) have succeeded remarkably in various natural language processing (NLP) tasks. Their ability to perform complex reasoning-spanning logical deduction, mathematical problem-solving, commonsense inference, and multi-step reasoning-often falls short of human expectations. This survey provides a comprehensive review of emerging techniques enhancing reasoning in LLMs.
arXiv Detail & Related papers (2025-02-05T23:31:39Z)
Enhancing LLM Reasoning with Reward-guided Tree Search [95.06503095273395]
o1-like reasoning approach is challenging, and researchers have been making various attempts to advance this open area of research.<n>We present a preliminary exploration into enhancing the reasoning abilities of LLMs through reward-guided tree search algorithms.
arXiv Detail & Related papers (2024-11-18T16:15:17Z)
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models [56.9134620424985]
Cross-modal reasoning (CMR) is increasingly recognized as a crucial capability in the progression toward more sophisticated artificial intelligence systems. The recent trend of deploying Large Language Models (LLMs) to tackle CMR tasks has marked a new mainstream of approaches for enhancing their effectiveness. This survey offers a nuanced exposition of current methodologies applied in CMR using LLMs, classifying these into a detailed three-tiered taxonomy.
arXiv Detail & Related papers (2024-09-19T02:51:54Z)
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey [25.732397636695882]
Large language models (LLMs) have recently shown impressive performance on tasks involving reasoning. Despite these successes, the depth of LLMs' reasoning abilities remains uncertain.
arXiv Detail & Related papers (2024-04-02T11:46:31Z)
Towards LogiGLUE: A Brief Survey and A Benchmark for Analyzing Logical Reasoning Capabilities of Language Models [56.34029644009297]
Large language models (LLMs) have demonstrated the ability to overcome various limitations of formal Knowledge Representation (KR) systems. LLMs excel most in abductive reasoning, followed by deductive reasoning, while they are least effective at inductive reasoning. We study single-task training, multi-task training, and "chain-of-thought" knowledge distillation fine-tuning technique to assess the performance of model.
arXiv Detail & Related papers (2023-10-02T01:00:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.