Related papers: When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models

When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models

URL: http://arxiv.org/abs/2404.09129v1
Date: Sun, 14 Apr 2024 02:47:32 GMT
Title: When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models
Authors: Yanhong Li, Chenghao Yang, Allyson Ettinger,
Abstract summary: Self-reflection enhances performance in TruthfulQA, but adversely affects results in HotpotQA. We find that self-reflection shows the most benefit when models are less likely to be correct initially, and when overall question difficulty is higher. Based on our findings, we propose guidelines for decisions on when to implement self-reflection.
Score: 15.781930031346105
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent studies suggest that self-reflective prompting can significantly enhance the reasoning capabilities of Large Language Models (LLMs). However, the use of external feedback as a stop criterion raises doubts about the true extent of LLMs' ability to emulate human-like self-reflection. In this paper, we set out to clarify these capabilities under a more stringent evaluation setting in which we disallow any kind of external feedback. Our findings under this setting show a split: while self-reflection enhances performance in TruthfulQA, it adversely affects results in HotpotQA. We conduct follow-up analyses to clarify the contributing factors in these patterns, and find that the influence of self-reflection is impacted both by reliability of accuracy in models' initial responses, and by overall question difficulty: specifically, self-reflection shows the most benefit when models are less likely to be correct initially, and when overall question difficulty is higher. We also find that self-reflection reduces tendency toward majority voting. Based on our findings, we propose guidelines for decisions on when to implement self-reflection. We release the codebase for reproducing our experiments at https://github.com/yanhong-lbh/LLM-SelfReflection-Eval.

Related papers

Are LLM Evaluators Really Narcissists? Sanity Checking Self-Preference Evaluations [3.262230127283452]
We show that evaluators may deliver self-preferring verdicts when the judge responds to queries which they completed incorrectly themselves.<n>We introduce an Evaluator Quality Baseline, which compares the probability that a judge incorrectly votes for itself against the probability that it votes for an incorrect response from another model.
arXiv Detail & Related papers (2026-01-30T04:38:18Z)
Teaching Large Reasoning Models Effective Reflection [62.73646680747003]
Large Reasoning Models (LRMs) have recently shown impressive performance on complex reasoning tasks.<n>However, not all reflections are beneficial-many are superficial, offering little to no improvement over the original answer.<n>We first propose Self-Critique Fine-Tuning (SCFT), a training framework that enhances the model's reflective reasoning ability using only self-generated critiques.
arXiv Detail & Related papers (2026-01-19T04:51:53Z)
Self-Consistency as a Free Lunch: Reducing Hallucinations in Vision-Language Models via Self-Reflection [71.8243083897721]
Vision-language models often hallucinate details, generating non-existent objects or inaccurate attributes that compromise output reliability.<n>We present a novel framework that leverages the model's self-consistency between long responses and short answers to generate preference pairs for training.
arXiv Detail & Related papers (2025-09-27T10:37:11Z)
Do Retrieval Augmented Language Models Know When They Don't Know? [55.72375712577378]
We ask the fundamental question: Do RALMs know when they don't know?<n>Contrary to expectations, we find that LLMs exhibit significant textbfover-refusal behavior.<n>We develop a simple yet effective refusal method for refusal post-trained models to improve their overall answer quality.
arXiv Detail & Related papers (2025-09-01T13:44:15Z)
From Emergence to Control: Probing and Modulating Self-Reflection in Language Models [23.176641726866105]
Self-reflection is a powerful behavior enabled by reinforcement learning with verifiable rewards.<n>We show that self-reflection is not exclusive to fine-tuned models.
arXiv Detail & Related papers (2025-06-13T20:40:13Z)
Internal Bias in Reasoning Models leads to Overthinking [58.817405319722596]
We show for the first time that overthinking in reasoning models may stem from their internal bias towards input texts.<n>By masking out the original input section, the affect of internal bias can be effectively alleviated and the reasoning length could be reduced by 31%-53%.
arXiv Detail & Related papers (2025-05-22T09:35:52Z)
Do LLM Evaluators Prefer Themselves for a Reason? [21.730128682888168]
Large language models (LLMs) are increasingly used as automatic evaluators in applications such as benchmarking, reward modeling, and self-refinement. Prior work highlights a potential self-preference bias where LLMs favor their own generated responses. This raises a critical question: Is self-preference detrimental, or does it simply reflect objectively superior outputs from more capable models?
arXiv Detail & Related papers (2025-04-04T18:09:23Z)
Intent-Aware Self-Correction for Mitigating Social Biases in Large Language Models [38.1620443730172]
Self-Correction based on feedback improves the output quality of Large Language Models (LLMs) In this study, we demonstrate that clarifying intentions is essential for effectively reducing biases in LLMs through Self-Correction.
arXiv Detail & Related papers (2025-03-08T02:20:43Z)
Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision [120.40788744292739]
We propose a two-player paradigm that separates the roles of reasoning and critique models. We first propose AutoMathCritique, an automated and scalable framework for collecting critique data. We demonstrate that the critique models consistently improve the actor's performance on difficult queries at test-time.
arXiv Detail & Related papers (2024-11-25T17:11:54Z)
Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning Tasks [43.96835245022083]
Self-correction that instructs models to refine their outputs presents a promising solution to this issue. This study investigates the self-correction capabilities of Vision-Language Models during both inference and fine-tuning stages.
arXiv Detail & Related papers (2024-10-05T06:28:54Z)
Self-Reflection Outcome is Sensitive to Prompt Construction [1.3899663412994456]
We show that the outcome of self-reflection is sensitive to prompt wording. We propose different ways of constructing prompts that are conservative in identifying mistakes. Our findings highlight the importance of prompt engineering in self-reflection tasks.
arXiv Detail & Related papers (2024-06-14T20:07:11Z)
A Theoretical Understanding of Self-Correction through In-context Alignment [51.622068973630796]
Large language models (LLMs) are capable of improving their abilities purely by self-correction. We show that when LLMs give relatively accurate self-examinations as rewards, they are capable of refining responses in an in-context way. Inspired by these findings, we also illustrate applications of self-correction, such as defending against LLM jailbreaks.
arXiv Detail & Related papers (2024-05-28T22:33:02Z)
Small Language Models Need Strong Verifiers to Self-Correct Reasoning [69.94251699982388]
Self-correction has emerged as a promising solution to boost the reasoning performance of large language models (LLMs) This work explores whether small (= 13B) language models (LMs) have the ability of self-correction on reasoning tasks with minimal inputs from stronger LMs.
arXiv Detail & Related papers (2024-04-26T03:41:28Z)
Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement [75.7148545929689]
Large language models (LLMs) improve their performance through self-feedback on certain tasks while degrade on others. We formally define LLM's self-bias - the tendency to favor its own generation. We analyze six LLMs on translation, constrained text generation, and mathematical reasoning tasks.
arXiv Detail & Related papers (2024-02-18T03:10:39Z)
On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks [17.329365493094542]
We present a principled empirical study of the performance of GPT-4 in three domains: Game of 24, Graph Coloring, and STRIPS planning. We observe significant performance collapse with self-critique and significant performance gains with sound external verification.
arXiv Detail & Related papers (2024-02-12T23:11:01Z)
Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives [45.87069217634753]
Research indicates without external feedback, Large Language Model's intrinsic reflection is unstable. Our investigation unveils that the key bottleneck is the quality of the self-evaluated feedback. We advocate Self-Contrast: It adaptively explores diverse solving perspectives tailored to the request, contrasts the differences, and summarizes these discrepancies into a checklist which could be used to re-examine and eliminate discrepancies.
arXiv Detail & Related papers (2024-01-04T00:32:33Z)
Large Language Models Cannot Self-Correct Reasoning Yet [78.16697476530994]
Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities. Concerns persist regarding the accuracy and appropriateness of their generated content. A contemporary methodology, self-correction, has been proposed as a remedy to these issues.
arXiv Detail & Related papers (2023-10-03T04:56:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.