Related papers: Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives

Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives

URL: http://arxiv.org/abs/2401.02009v3
Date: Thu, 6 Jun 2024 18:46:03 GMT
Title: Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives
Authors: Wenqi Zhang, Yongliang Shen, Linjuan Wu, Qiuying Peng, Jun Wang, Yueting Zhuang, Weiming Lu,
Abstract summary: Research indicates without external feedback, Large Language Model's intrinsic reflection is unstable. Our investigation unveils that the key bottleneck is the quality of the self-evaluated feedback. We advocate Self-Contrast: It adaptively explores diverse solving perspectives tailored to the request, contrasts the differences, and summarizes these discrepancies into a checklist which could be used to re-examine and eliminate discrepancies.
Score: 45.87069217634753
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The reflection capacity of Large Language Model (LLM) has garnered extensive attention. A post-hoc prompting strategy, e.g., reflexion and self-refine, refines LLM's response based on self-evaluated or external feedback. However, recent research indicates without external feedback, LLM's intrinsic reflection is unstable. Our investigation unveils that the key bottleneck is the quality of the self-evaluated feedback. We find LLMs often exhibit overconfidence or high randomness when self-evaluate, offering stubborn or inconsistent feedback, which causes poor reflection. To remedy this, we advocate Self-Contrast: It adaptively explores diverse solving perspectives tailored to the request, contrasts the differences, and summarizes these discrepancies into a checklist which could be used to re-examine and eliminate discrepancies. Our method endows LLM with diverse perspectives to alleviate stubborn biases. Moreover, their discrepancies indicate potential errors or inherent uncertainties that LLM often overlooks. Reflecting upon these can catalyze more accurate and stable reflection. Experiments conducted on a series of reasoning and translation tasks with different LLMs serve to underscore the effectiveness and generality of our strategy.

Related papers

How Overconfidence in Initial Choices and Underconfidence Under Criticism Modulate Change of Mind in Large Language Models [28.62988505317048]
Large language models (LLMs) exhibit strikingly conflicting behaviors.<n>LLMs can appear steadfastly overconfident in their initial answers whilst being prone to excessive doubt when challenged.<n>We show that LLMs exhibit a pronounced choice-supportive bias that reinforces and boosts their estimate of confidence in their answer.
arXiv Detail & Related papers (2025-07-03T18:57:43Z)
Understanding the Dark Side of LLMs' Intrinsic Self-Correction [55.51468462722138]
Intrinsic self-correction was proposed to improve LLMs' responses via feedback prompts solely based on their inherent capability. Recent works show that LLMs' intrinsic self-correction fails without oracle labels as feedback prompts. We identify intrinsic self-correction can cause LLMs to waver both intermedia and final answers and lead to prompt bias on simple factual questions.
arXiv Detail & Related papers (2024-12-19T15:39:31Z)
Meta-Reflection: A Feedback-Free Reflection Learning Framework [57.14485943991588]
We propose Meta-Reflection, a feedback-free reflection mechanism that requires only a single inference pass without external feedback. Motivated by the human ability to remember and retrieve reflections from past experiences, Meta-Reflection integrates reflective insights into a codebook. To thoroughly investigate and evaluate the practicality of Meta-Reflection in real-world scenarios, we introduce an industrial e-commerce benchmark named E-commerce Customer Intent Detection.
arXiv Detail & Related papers (2024-12-18T12:20:04Z)
Self-correction is Not An Innate Capability in Large Language Models: A Case Study of Moral Self-correction [8.61034573238112]
We argue that moral self-correction is not an innate capability of Large Language Models (LLMs) We conduct a mechanistic analysis of how key components of self-correction, such as Chain-of-Thought (CoT) reasoning and external feedback, interact to enable moral self-correction.
arXiv Detail & Related papers (2024-10-27T16:52:21Z)
Mirror-Consistency: Harnessing Inconsistency in Majority Voting [54.30719306011487]
We present Mirror-Consistency, an enhancement of the standard Self-Consistency approach. Mirror-Consistency incorporates a'reflective mirror' into the self-ensemble decoding process. We show that Mirror-Consistency yields superior performance in both reasoning accuracy and confidence calibration compared to Self-Consistency.
arXiv Detail & Related papers (2024-10-07T03:41:08Z)
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism [39.392450788666814]
Current evaluations of large language models (LLMs) often overlook non-determinism. greedy decoding generally outperforms sampling methods for most evaluated tasks. Smaller LLMs can match or surpass larger models such as GPT-4-Turbo.
arXiv Detail & Related papers (2024-07-15T06:12:17Z)
When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models [15.781930031346105]
Self-reflection enhances performance in TruthfulQA, but adversely affects results in HotpotQA. We find that self-reflection shows the most benefit when models are less likely to be correct initially, and when overall question difficulty is higher. Based on our findings, we propose guidelines for decisions on when to implement self-reflection.
arXiv Detail & Related papers (2024-04-14T02:47:32Z)
Think Twice Before Trusting: Self-Detection for Large Language Models through Comprehensive Answer Reflection [90.71323430635593]
We propose a novel self-detection paradigm that considers the comprehensive answer space beyond LLM-generated answers. Building upon this paradigm, we introduce a two-step framework, which firstly instructs LLM to reflect and provide justifications for each candidate answer. This framework can be seamlessly integrated with existing approaches for superior self-detection.
arXiv Detail & Related papers (2024-03-15T02:38:26Z)
Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models [84.94220787791389]
We propose Fact-and-Reflection (FaR) prompting, which improves the LLM calibration in two steps. Experiments show that FaR achieves significantly better calibration; it lowers the Expected Error by 23.5%. FaR even elicits the capability of verbally expressing concerns in less confident scenarios.
arXiv Detail & Related papers (2024-02-27T01:37:23Z)
Mirror: A Multiple-perspective Self-Reflection Method for Knowledge-rich Reasoning [18.5717357875955]
Large language models (LLMs) struggle with knowledge-rich problems without access to external resources. We propose Mirror, a Multiple-perspective self-reflection method for knowledge-rich reasoning.
arXiv Detail & Related papers (2024-02-22T20:57:17Z)
Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools. Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions. Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate [85.3444184685235]
We propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of "tit for tat" and a judge manages the debate process to obtain a final solution. Our framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation.
arXiv Detail & Related papers (2023-05-30T15:25:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.