Related papers: Unveiling the Latent Directions of Reflection in Large Language Models

Unveiling the Latent Directions of Reflection in Large Language Models

URL: http://arxiv.org/abs/2508.16989v1
Date: Sat, 23 Aug 2025 11:05:15 GMT
Title: Unveiling the Latent Directions of Reflection in Large Language Models
Authors: Fu-Chieh Chang, Yu-Ting Lee, Pei-Yuan Wu,
Abstract summary: We investigate reflection through the lens of latent directions in model activations.<n>New reflection-inducing instructions can be systematically identified, and reflective behavior can be directly enhanced or suppressed.<n>This work opens a path toward mechanistic understanding of reflective reasoning in large language models.
Score: 3.396557052704669
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reflection, the ability of large language models (LLMs) to evaluate and revise their own reasoning, has been widely used to improve performance on complex reasoning tasks. Yet, most prior work emphasizes designing reflective prompting strategies or reinforcement learning objectives, leaving the inner mechanisms of reflection underexplored. In this paper, we investigate reflection through the lens of latent directions in model activations. We propose a methodology based on activation steering to characterize how instructions with different reflective intentions: no reflection, intrinsic reflection, and triggered reflection. By constructing steering vectors between these reflection levels, we demonstrate that (1) new reflection-inducing instructions can be systematically identified, (2) reflective behavior can be directly enhanced or suppressed through activation interventions, and (3) suppressing reflection is considerably easier than stimulating it. Experiments on GSM8k-adv with Qwen2.5-3B and Gemma3-4B reveal clear stratification across reflection levels, and steering interventions confirm the controllability of reflection. Our findings highlight both opportunities (e.g., reflection-enhancing defenses) and risks (e.g., adversarial inhibition of reflection in jailbreak attacks). This work opens a path toward mechanistic understanding of reflective reasoning in LLMs.

Related papers

Learning from Trials and Errors: Reflective Test-Time Planning for Embodied LLMs [63.88783817420284]
Embodied robots cannot reflect on what went wrong or why, turning deployment into a sequence of independent trials.<n>We introduce Reflective Test-Time Planning, which integrates two modes of reflection: textitreflection-in-action and textitreflection-on-action<n>We also include retrospective reflection, allowing the agent to re-evaluate earlier decisions and perform model updates with hindsight.
arXiv Detail & Related papers (2026-02-24T18:55:18Z)
ReflCtrl: Controlling LLM Reflection via Representation Engineering [6.828302913581854]
We study self-reflection through the lens of representation engineering.<n>We propose a stepwise steering method that can control reflection frequency.<n>In experiments, we can save up to 33.6 percent of reasoning tokens while preserving performance.
arXiv Detail & Related papers (2025-12-16T00:38:34Z)
First Try Matters: Revisiting the Role of Reflection in Reasoning Models [66.39546876232512]
We focus on reflective behaviours where the model has already produced an answer but continues reflecting before finalizing its output.<n>Our analysis reveals that reflections are predominantly confirmatory and rarely alter the model's initial answer.<n>We propose a question-aware early-stopping method that enhances inference-time token efficiency by stopping the reasoning process once a few plausible candidate answers are generated.
arXiv Detail & Related papers (2025-10-09T14:57:10Z)
ReflectEvo: Improving Meta Introspection of Small LLMs by Learning Self-Reflection [60.75785864719726]
We present a novel pipeline, ReflectEvo, to demonstrate that small language models (SLMs) can enhance meta introspection through reflection learning.<n>We construct ReflectEvo-460k, a large-scale, comprehensive, self-generated reflection dataset with broadened instructions and diverse multi-domain tasks.
arXiv Detail & Related papers (2025-05-22T10:03:05Z)
Perception in Reflection [39.33505560810175]
We present a perception in reflection paradigm designed to transcend the limitations of current large vision-language models.<n>We propose Reflective Perception (RePer), a dual-model reflection mechanism that systematically alternates between policy and critic models.
arXiv Detail & Related papers (2025-04-09T17:59:02Z)
Instruct-of-Reflection: Enhancing Large Language Models Iterative Reflection Capabilities via Dynamic-Meta Instruction [11.838351314880736]
Instruct-of-Reflection (IoRT) is a novel and general reflection framework that leverages dynamic-meta instruction to enhance the iterative reflection capability of Large Language Models (LLMs)<n>Our experiments demonstrate that IoRT achieves an average improvement of 10.1% over established baselines in mathematical and commonsense reasoning tasks.
arXiv Detail & Related papers (2025-03-02T14:02:03Z)
Meta-Reflection: A Feedback-Free Reflection Learning Framework [57.14485943991588]
We propose Meta-Reflection, a feedback-free reflection mechanism that requires only a single inference pass without external feedback.<n>Motivated by the human ability to remember and retrieve reflections from past experiences, Meta-Reflection integrates reflective insights into a codebook.<n>To thoroughly investigate and evaluate the practicality of Meta-Reflection in real-world scenarios, we introduce an industrial e-commerce benchmark named E-commerce Customer Intent Detection.
arXiv Detail & Related papers (2024-12-18T12:20:04Z)
FIRM: Flexible Interactive Reflection reMoval [75.38207315080624]
This paper presents FIRM, a novel framework for Flexible Interactive image Reflection reMoval.<n>The proposed framework requires only 10% of the guidance time needed by previous interactive methods.<n>Results on public real-world reflection removal datasets validate that our method demonstrates state-of-the-art reflection removal performance.
arXiv Detail & Related papers (2024-06-03T17:34:37Z)
Mirror: A Multiple-perspective Self-Reflection Method for Knowledge-rich Reasoning [18.5717357875955]
Large language models (LLMs) struggle with knowledge-rich problems without access to external resources. We propose Mirror, a Multiple-perspective self-reflection method for knowledge-rich reasoning.
arXiv Detail & Related papers (2024-02-22T20:57:17Z)
Pinning "Reflection" on the Agenda: Investigating Reflection in Human-LLM Co-Creation for Creative Coding [20.58817370147299]
This study investigates situated, moment-to-moment reflection in creative coding under two prompting strategies.<n>Our mixed-method results reveal three distinct reflection types and show that T2 encourages more frequent, strategic, and generative reflection.
arXiv Detail & Related papers (2024-02-15T07:00:06Z)
Revisiting Single Image Reflection Removal In the Wild [83.42368937164473]
This research focuses on the issue of single-image reflection removal (SIRR) in real-world conditions. We devise an advanced reflection collection pipeline that is highly adaptable to a wide range of real-world reflection scenarios. We develop a large-scale, high-quality reflection dataset named Reflection Removal in the Wild (RRW)
arXiv Detail & Related papers (2023-11-29T02:31:10Z)
Location-aware Single Image Reflection Removal [54.93808224890273]
This paper proposes a novel location-aware deep learning-based single image reflection removal method. We use a reflection confidence map as the cues for the network to learn how to encode the reflection information adaptively. The integration of location information into the network significantly improves the quality of reflection removal results.
arXiv Detail & Related papers (2020-12-13T19:34:35Z)
Two-Stage Single Image Reflection Removal with Reflection-Aware Guidance [78.34235841168031]
We present a novel two-stage network with reflection-aware guidance (RAGNet) for single image reflection removal (SIRR) RAG can be used (i) to mitigate the effect of reflection from the observation, and (ii) to generate mask in partial convolution for mitigating the effect of deviating from linear combination hypothesis. Experiments on five commonly used datasets demonstrate the quantitative and qualitative superiority of our RAGNet in comparison to the state-of-the-art SIRR methods.
arXiv Detail & Related papers (2020-12-02T03:14:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.