Related papers: Self-Reflective Generation at Test Time

Self-Reflective Generation at Test Time

URL: http://arxiv.org/abs/2510.02919v1
Date: Fri, 03 Oct 2025 11:46:04 GMT
Title: Self-Reflective Generation at Test Time
Authors: Jian Mu, Qixin Zhang, Zhiyong Wang, Menglin Yang, Shuang Qiu, Chengwei Qin, Zhongxiang Dai, Yao Shu,
Abstract summary: Large language models (LLMs) increasingly solve complex reasoning tasks via long chain-of-thought.<n>Existing self-reflection either performs revisions over full drafts or learns self-correction via expensive training.<n>We propose Self-Reflective Generation at Test Time (SRGen), a lightweight test-time framework that reflects before generating at uncertain points.
Score: 42.02273611421918
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) increasingly solve complex reasoning tasks via long chain-of-thought, but their forward-only autoregressive generation process is fragile; early token errors can cascade, which creates a clear need for self-reflection mechanisms. However, existing self-reflection either performs revisions over full drafts or learns self-correction via expensive training, both fundamentally reactive and inefficient. To address this, we propose Self-Reflective Generation at Test Time (SRGen), a lightweight test-time framework that reflects before generating at uncertain points. During token generation, SRGen utilizes dynamic entropy thresholding to identify high-uncertainty tokens. For each identified token, it trains a specific corrective vector, which fully exploits the already generated context for a self-reflective generation to correct the token probability distribution. By retrospectively analyzing the partial output, this self-reflection enables more trustworthy decisions, thereby significantly reducing the probability of errors at highly uncertain points. Evaluated on challenging mathematical reasoning benchmarks and a diverse set of LLMs, SRGen can consistently strengthen model reasoning: improvements in single-pass quality also translate into stronger self-consistency voting. Especially, on AIME2024 with DeepSeek-R1-Distill-Qwen-7B, SRGen yields absolute improvements of +12.0% on Pass@1 and +13.3% on Cons@5. Moreover, our findings position SRGen as a plug-and-play method that integrates reflection into the generation process for reliable LLM reasoning, achieving consistent gains with bounded overhead and broad composability with other training-time (e.g., RLHF) and test-time (e.g., SLOT) techniques.

Related papers

Reflective Confidence: Correcting Reasoning Flaws via Online Self-Correction [14.164508061248775]
Large language models (LLMs) have achieved strong performance on complex reasoning tasks using techniques such as chain-of-thought and self-consistency.<n>We propose reflective confidence, a novel reasoning framework that transforms low-confidence signals from termination indicators into reflection triggers.<n> Experiments on mathematical reasoning benchmarks, including AIME 2025, demonstrate significant accuracy improvements over advanced early-stopping baselines at comparable computational cost.
arXiv Detail & Related papers (2025-12-21T05:35:07Z)
Incorporating Self-Rewriting into Large Language Model Reasoning Reinforcement [54.63337314382886]
We introduce self-rewriting framework, where a model rewrites its own reasoning texts, and subsequently learns from the rewritten reasoning to improve internal thought process quality.<n>For algorithm design, we propose a selective rewriting approach wherein only "simple" samples, defined by the model's consistent correctness, are rewritten.<n>Experiments on diverse tasks with different model sizes validate the effectiveness of self-rewriting.
arXiv Detail & Related papers (2025-11-20T13:10:52Z)
SSR: Socratic Self-Refine for Large Language Model Reasoning [78.62319252287938]
Socratic Self-Refine (SSR) is a novel framework for fine-grained evaluation and precise refinement of Large Language Models (LLMs)<n>Our proposed SSR decomposes model responses into verifiable (sub-question, sub-answer) pairs, enabling step-level confidence estimation.<n> Empirical results across five reasoning benchmarks and three LLMs show that SSR consistently outperforms state-of-the-art iterative self-refinement baselines.
arXiv Detail & Related papers (2025-11-13T18:47:07Z)
LaSeR: Reinforcement Learning with Last-Token Self-Rewarding [54.72617309922891]
Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a core paradigm for enhancing the reasoning capabilities of Large Language Models (LLMs)<n>Previous practice requires the LLM to sequentially generate solutions and self-verifications using two separate prompt templates, which significantly reduces efficiency.<n>We propose LaSeR (Reinforcement Learning with Last-Token Self-Rewarding), an algorithm that simply augments the original RLVR loss with a MSE loss.
arXiv Detail & Related papers (2025-10-16T17:55:11Z)
Self-Verifying Reflection Helps Transformers with CoT Reasoning [17.52238613831439]
Advanced large language models (LLMs) frequently reflect in reasoning chain-of-thoughts (CoTs)<n>We present a minimalistic reasoning framework to support basic self-verifying reflection for small transformers without natural language.<n>We show that tiny transformers, with only a few million parameters, benefit from self-verification in both training and reflective execution.
arXiv Detail & Related papers (2025-10-14T05:22:50Z)
PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier [18.771754895027616]
Policy as Generative Verifier (PAG) is a framework that empowers Large Language Models to self-correct by alternating between policy and verifier roles.<n>It alleviates model collapse and jointly enhances both reasoning and verification abilities.
arXiv Detail & Related papers (2025-06-12T06:59:35Z)
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards [67.86091419220816]
Large Language Models (LLMs) show great promise in complex reasoning.<n>A prevalent issue is superficial self-reflection'', where models fail to robustly verify their own outputs.<n>We introduce RISE (Reinforcing Reasoning with Self-Verification), a novel online RL framework designed to tackle this.
arXiv Detail & Related papers (2025-05-19T17:59:31Z)
Scalable Best-of-N Selection for Large Language Models via Self-Certainty [65.31658824274894]
Best-of-N selection is a key technique for improving the reasoning performance of Large Language Models.<n>We propose self-certainty, a novel and efficient metric to estimate response quality without requiring external reward models.<n>Our findings establish self-certainty as a practical and efficient way for improving LLM reasoning capabilities.
arXiv Detail & Related papers (2025-02-25T19:08:07Z)
SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales [29.33581578047835]
SaySelf is a training framework that teaches large language models to express more accurate fine-grained confidence estimates. In addition, SaySelf directs LLMs to produce self-reflective rationales that clearly identify gaps in their parametric knowledge. We show that the generated self-reflective rationales are reasonable and can further contribute to the calibration.
arXiv Detail & Related papers (2024-05-31T16:21:16Z)
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection [74.51523859064802]
We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG) Self-RAG enhances an LM's quality and factuality through retrieval and self-reflection. It significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks.
arXiv Detail & Related papers (2023-10-17T18:18:32Z)
Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL. We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training. For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.