Related papers: A Short Survey on Small Reasoning Models: Training, Inference, Applications and Research Directions

A Short Survey on Small Reasoning Models: Training, Inference, Applications and Research Directions

URL: http://arxiv.org/abs/2504.09100v1
Date: Sat, 12 Apr 2025 06:45:57 GMT
Title: A Short Survey on Small Reasoning Models: Training, Inference, Applications and Research Directions
Authors: Chengyu Wang, Taolin Zhang, Richang Hong, Jun Huang,
Abstract summary: Reasoning capabilities of large reasoning models (LRMs) have seen significant advancements through the slow thinking process.<n>In contrast, small reasoning models (SRMs), often distilled from larger ones, offer greater efficiency and can exhibit distinct capabilities.
Score: 42.77077835885798
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recently, the reasoning capabilities of large reasoning models (LRMs), such as DeepSeek-R1, have seen significant advancements through the slow thinking process. Despite these achievements, the substantial computational demands of LRMs present considerable challenges. In contrast, small reasoning models (SRMs), often distilled from larger ones, offer greater efficiency and can exhibit distinct capabilities and cognitive trajectories compared to LRMs. This work surveys around 170 recently published papers on SRMs for tackling various complex reasoning tasks. We review the current landscape of SRMs and analyze diverse training and inference techniques related to SRMs. Furthermore, we provide a comprehensive review of SRMs for domain-specific applications and discuss possible future research directions. This survey serves as an essential reference for researchers to leverage or develop SRMs for advanced reasoning functionalities with high efficiency.

Related papers

Towards Concise and Adaptive Thinking in Large Reasoning Models: A Survey [8.736170026262279]
Large reasoning models (LRMs) like OpenAI o1 and DeepSeek R1 have demonstrated impressive performance on complex reasoning tasks.<n>These models also face a huge challenge that generating unnecessarily lengthy and redundant reasoning chains.
arXiv Detail & Related papers (2025-07-13T14:51:59Z)
Socratic-PRMBench: Benchmarking Process Reward Models with Systematic Reasoning Patterns [79.42805969325036]
Process Reward Models (PRMs) are crucial in complex reasoning and problem-solving tasks.<n>PRMs are required to identify errors under various reasoning patterns during the reasoning process.<n>Existing benchmarks mainly focus on evaluating PRMs with stepwise correctness.<n>We introduce Socratic-PRMBench, a new benchmark to evaluate PRMs systematically under six reasoning patterns.
arXiv Detail & Related papers (2025-05-29T14:26:53Z)
THINK-Bench: Evaluating Thinking Efficiency and Chain-of-Thought Quality of Large Reasoning Models [17.609493312457]
Large reasoning models (LRMs) have achieved impressive performance in complex tasks, often outperforming conventional large language models (LLMs)<n>Overthinking severely limits their computational efficiency.<n>We introduce Think-Bench, a benchmark designed to evaluate the reasoning efficiency of LRMs.
arXiv Detail & Related papers (2025-05-28T08:41:14Z)
Reward Reasoning Model [104.39256985858428]
Reward Reasoning Models (RRMs) are designed to execute a deliberate reasoning process before generating final rewards.<n>We implement a reinforcement learning framework that fosters self-evolved reward reasoning capabilities.<n> Notably, RRMs can adaptively exploit test-time compute to further improve reward accuracy.
arXiv Detail & Related papers (2025-05-20T17:58:03Z)
100 Days After DeepSeek-R1: A Survey on Replication Studies and More Directions for Reasoning Language Models [58.98176123850354]
The recent release of DeepSeek-R1 has generated widespread social impact and sparked enthusiasm in the research community for exploring the explicit reasoning paradigm of language models. The implementation details of the released models have not been fully open-sourced by DeepSeek, including DeepSeek-R1-Zero, DeepSeek-R1, and the distilled small models. Many replication studies have emerged aiming to reproduce the strong performance achieved by DeepSeek-R1, reaching comparable performance through similar training procedures and fully open-source data resources.
arXiv Detail & Related papers (2025-05-01T14:28:35Z)
Revisiting Prompt Optimization with Large Reasoning Models-A Case Study on Event Extraction [8.88001387249786]
Large Reasoning Models (LRMs) such as DeepSeek-R1 and OpenAI o1 have demonstrated remarkable capabilities in various reasoning tasks.<n>Their strong capability to generate and reason over intermediate thoughts has led to arguments that they may no longer require extensive prompt engineering or optimization to interpret human instructions.<n>In this work, we aim to systematically study this open question, using the structured task of event extraction for a case study.
arXiv Detail & Related papers (2025-04-10T00:53:59Z)
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation [38.64751082999587]
Large Reasoning Models (LRMs) exhibit remarkable reasoning abilities but rely primarily on parametric knowledge, limiting factual accuracy.<n>We propose ReaRAG, a factuality-enhanced reasoning model that explores diverse queries without excessive iterations.<n>Our study enhances LRMs' factuality while effectively integrating robust reasoning for Retrieval-Augmented Generation (RAG)
arXiv Detail & Related papers (2025-03-27T17:44:18Z)
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond [88.5807076505261]
Large Reasoning Models (LRMs) have demonstrated strong performance gains by scaling up the length of Chain-of-Thought (CoT) reasoning during inference.<n>A growing concern lies in their tendency to produce excessively long reasoning traces.<n>This inefficiency introduces significant challenges for training, inference, and real-world deployment.
arXiv Detail & Related papers (2025-03-27T15:36:30Z)
Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities [101.77467538102924]
Recent advancements in Large Reasoning Models (LRMs) have demonstrated remarkable performance in specialized reasoning tasks.<n>We show that acquiring deliberative reasoning capabilities significantly reduces the foundational capabilities of LRMs.<n>We demonstrate that adaptive reasoning -- employing modes like Zero-Thinking, Less-Thinking, and Summary-Thinking -- can effectively alleviate these drawbacks.
arXiv Detail & Related papers (2025-03-23T08:18:51Z)
Large Reasoning Models in Agent Scenarios: Exploring the Necessity of Reasoning Capabilities [74.35956310688164]
We propose the LaRMA framework, encompassing nine tasks across Tool Usage, Plan Design, and Problem Solving.<n>Our findings address four research questions: LRMs surpass LLMs in reasoning-intensive tasks like Plan Design, leveraging iterative reflection for superior outcomes.<n>LRMs' enhanced reasoning incurs higher computational costs, prolonged processing, and behavioral challenges, including overthinking and fact-ignoring tendencies.
arXiv Detail & Related papers (2025-03-14T04:34:31Z)
Retrieval-Augmented Process Reward Model for Generalizable Mathematical Reasoning [32.850036320802474]
We introduce Retrieval-Augmented Process Reward Model (RetrievalPRM), a novel framework designed to tackle OOD issues.<n>By utilizing a two-stage retrieval-enhanced mechanism, RetrievalPRM retrieves semantically similar questions and steps as a warmup.<n>Our experiments demonstrate that RetrievalPRM outperforms existing baselines across multiple real-world datasets.
arXiv Detail & Related papers (2025-02-20T08:40:09Z)
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models [56.9134620424985]
Cross-modal reasoning (CMR) is increasingly recognized as a crucial capability in the progression toward more sophisticated artificial intelligence systems. The recent trend of deploying Large Language Models (LLMs) to tackle CMR tasks has marked a new mainstream of approaches for enhancing their effectiveness. This survey offers a nuanced exposition of current methodologies applied in CMR using LLMs, classifying these into a detailed three-tiered taxonomy.
arXiv Detail & Related papers (2024-09-19T02:51:54Z)
Hierarchies of Reward Machines [75.55324974788475]
Reward machines (RMs) are a recent formalism for representing the reward function of a reinforcement learning task through a finite-state machine. We propose a formalism for further abstracting the subtask structure by endowing an RM with the ability to call other RMs.
arXiv Detail & Related papers (2022-05-31T12:39:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.