Related papers: Recursive Think-Answer Process for LLMs and VLMs

Recursive Think-Answer Process for LLMs and VLMs

URL: http://arxiv.org/abs/2603.02099v2
Date: Tue, 03 Mar 2026 09:00:50 GMT
Title: Recursive Think-Answer Process for LLMs and VLMs
Authors: Byung-Kwan Lee, Youngchae Chee, Yong Man Ro,
Abstract summary: We propose an efficient Recursive Think-Answer Process (R-TAP)<n>R-TAP enables models to engage in iterative reasoning cycles and generate more accurate answers.<n>We show that R-TAP-enhanced models consistently outperform conventional single-pass methods.
Score: 54.52289112197118
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Think-Answer reasoners such as DeepSeek-R1 have made notable progress by leveraging interpretable internal reasoning. However, despite the frequent presence of self-reflective cues like "Oops!", they remain vulnerable to output errors during single-pass inference. To address this limitation, we propose an efficient Recursive Think-Answer Process (R-TAP) that enables models to engage in iterative reasoning cycles and generate more accurate answers, going beyond conventional single-pass approaches. Central to this approach is a confidence generator that evaluates the certainty of model responses and guides subsequent improvements. By incorporating two complementary rewards-Recursively Confidence Increase Reward and Final Answer Confidence Reward-we show that R-TAP-enhanced models consistently outperform conventional single-pass methods for both large language models (LLMs) and vision-language models (VLMs). Moreover, by analyzing the frequency of "Oops"-like expressions in model responses, we find that R-TAP-applied models exhibit significantly fewer self-reflective patterns, resulting in more stable and faster inference-time reasoning. We hope R-TAP pave the way evolving into efficient and elaborated methods to refine the reasoning processes of future AI.

Related papers

Adversarial Yet Cooperative: Multi-Perspective Reasoning in Retrieved-Augmented Language Models [72.4149653187766]
We propose a Reasoner-Verifier framework named Adrialversa Reasoning RAG (ARR)<n>The Reasoner and Verifier engage in reasoning on retrieved evidence and critiquing each other's logic while being guided by process-aware advantage.<n> Experiments on multiple benchmarks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2026-01-08T06:57:03Z)
Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning [137.33138614095435]
Retrieval-augmented generation (RAG) has proven to be effective in mitigating hallucinations in large language models.<n>Recent efforts have incorporated search-based interactions into RAG, enabling iterative reasoning with real-time retrieval.<n>We propose Bi-RAR, a novel retrieval-augmented reasoning framework that evaluates each intermediate step jointly in both forward and backward directions.
arXiv Detail & Related papers (2025-11-12T08:29:39Z)
First Try Matters: Revisiting the Role of Reflection in Reasoning Models [66.39546876232512]
We focus on reflective behaviours where the model has already produced an answer but continues reflecting before finalizing its output.<n>Our analysis reveals that reflections are predominantly confirmatory and rarely alter the model's initial answer.<n>We propose a question-aware early-stopping method that enhances inference-time token efficiency by stopping the reasoning process once a few plausible candidate answers are generated.
arXiv Detail & Related papers (2025-10-09T14:57:10Z)
ConciseHint: Boosting Efficient Reasoning via Continuous Concise Hints during Generation [74.37307916314407]
We propose a framework dubbed ConciseHint, which continuously encourages the reasoning model to speak concisely.<n>Experiments on the state-of-the-art LRMs, including DeepSeek-R1 and Qwen-3 series, demonstrate that our method can effectively produce concise reasoning.
arXiv Detail & Related papers (2025-06-23T16:20:44Z)
Efficient Reasoning Through Suppression of Self-Affirmation Reflections in Large Reasoning Models [29.615519143908998]
Self-affirmation reflections are redundant reflective steps that affirm prior content and often occur after the already correct reasoning steps.<n>We show that suppressing self-affirmation reflections reduces output length without degrading accuracy across multiple models.<n>We also improve current train-based method by explicitly suppressing such reflections.
arXiv Detail & Related papers (2025-06-14T05:30:09Z)
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards [67.86091419220816]
Large Language Models (LLMs) show great promise in complex reasoning.<n>A prevalent issue is superficial self-reflection'', where models fail to robustly verify their own outputs.<n>We introduce RISE (Reinforcing Reasoning with Self-Verification), a novel online RL framework designed to tackle this.
arXiv Detail & Related papers (2025-05-19T17:59:31Z)
Think Deep, Think Fast: Investigating Efficiency of Verifier-free Inference-time-scaling Methods [39.89239733570008]
This work conducts a comprehensive analysis of inference-time scaling methods for both reasoning and non-reasoning models.<n>We find that non-reasoning models, even with an extremely high inference budget, still fall substantially behind reasoning models.<n>For reasoning models, majority voting proves to be a robust inference strategy, generally competitive or outperforming other more sophisticated ITC methods.
arXiv Detail & Related papers (2025-04-18T19:32:55Z)
Vision-Language Models Can Self-Improve Reasoning via Reflection [20.196406628954303]
Chain-of-thought (CoT) has proven to improve the reasoning capability of large language models (LLMs) We propose a self-training framework, R3V, which iteratively enhances the model's Vision-language Reasoning by Reflecting on CoT Rationales. Our approach supports self-reflection on generated solutions, further boosting performance through test-time computation.
arXiv Detail & Related papers (2024-10-30T14:45:00Z)
PRefLexOR: Preference-based Recursive Language Modeling for Exploratory Optimization of Reasoning and Agentic Thinking [0.0]
PRefLexOR combines preference optimization with concepts from Reinforcement Learning to enable models to self-teach. We focus on applications in biological materials science and demonstrate the method in a variety of case studies.
arXiv Detail & Related papers (2024-10-16T08:46:26Z)
Recursive Introspection: Teaching Language Model Agents How to Self-Improve [30.086494067593268]
We develop RISE: Recursive IntroSpEction, an approach for fine-tuning large language models. Our experiments show that RISE enables Llama2, Llama3, and Mistral models to improve themselves with more turns on math reasoning tasks.
arXiv Detail & Related papers (2024-07-25T17:35:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.