Related papers: Recursive Chain-of-Feedback Prevents Performance Degradation from Redundant Prompting

Recursive Chain-of-Feedback Prevents Performance Degradation from Redundant Prompting

URL: http://arxiv.org/abs/2402.02648v2
Date: Fri, 1 Mar 2024 10:46:01 GMT
Title: Recursive Chain-of-Feedback Prevents Performance Degradation from Redundant Prompting
Authors: Jinwoo Ahn, Kyuseung Shin
Abstract summary: This paper studies such repetitive behavior and its effect by defining a novel setting, Chain-of-Feedback (CoF) To alleviate these troubles, we propose a novel method, Recursive Chain-of-Feedback (R-CoF)
Score: 0.4662017507844857
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) frequently struggle with complex reasoning tasks, failing to construct logically sound steps towards the solution. In response to this behavior, users often try prompting the LLMs repeatedly in hopes of reaching a better response. This paper studies such repetitive behavior and its effect by defining a novel setting, Chain-of-Feedback (CoF). The setting takes questions that require multi-step reasoning as an input. Upon response, we repetitively prompt meaningless feedback (e.g. 'make another attempt') requesting additional trials. Surprisingly, our preliminary results show that repeated meaningless feedback gradually decreases the quality of the responses, eventually leading to a larger deviation from the intended outcome. To alleviate these troubles, we propose a novel method, Recursive Chain-of-Feedback (R-CoF). Following the logic of recursion in computer science, R-CoF recursively revises the initially incorrect response by breaking down each incorrect reasoning step into smaller individual problems. Our preliminary results show that majority of questions that LLMs fail to respond correctly can be answered using R-CoF without any sample data outlining the logical process.

Related papers

A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning [47.344750601893104]
Multi-turn problem solving is critical yet challenging for Large Reasoning Models (LRMs) to reflect on their reasoning and revise from feedback.<n>Existing Reinforcement Learning (RL) methods train large reasoning models on a single-turn paradigm with verifiable rewards.<n>We introduce Unary Feedback as Observation (UFO) for reinforcement learning, which uses minimal yet common unary user feedback during iterative problem solving.
arXiv Detail & Related papers (2025-07-18T18:07:38Z)
Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback [20.985320124495566]
LLMs possess some ability to improve their responses when given external feedback.<n>It remains unclear how effectively and thoroughly these models can incorporate external feedback.
arXiv Detail & Related papers (2025-06-13T16:31:51Z)
TACO: Think-Answer Consistency for Optimized Long-Chain Reasoning and Efficient Data Learning via Reinforcement Learning in LVLMs [50.820065021136024]
DeepSeek R1 has significantly advanced complex reasoning for large language models (LLMs)<n>Recent methods have attempted to replicate R1's reasoning capabilities in multimodal settings.<n>We propose TACO, a novel reinforcement learning algorithm for visual reasoning.
arXiv Detail & Related papers (2025-05-27T06:30:48Z)
Done Is Better than Perfect: Unlocking Efficient Reasoning by Structured Multi-Turn Decomposition [11.858707687894757]
Large Reasoning Models (LRMs) are criticized for the excessively lengthy Chain-of-Thought (CoT) to derive the final answer.<n>This paper introduces Multi-Turn Decomposition (MinD) to decode conventional CoT into a sequence of explicit, structured, and turn-wise interactions.<n>MinD can achieve up to 70% reduction in both output token usage and time to first token (TTFT)
arXiv Detail & Related papers (2025-05-26T10:18:57Z)
Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think [51.0691253204425]
We analyze intermediate reasoning steps, termed subthoughts, to answer two questions: Does the final answer reliably represent the model's optimal conclusion? Our approach involves segmenting a reasoning trace into sequential subthoughts based on linguistic cues. We find that aggregating these answers by selecting the most frequent one (the mode) often yields significantly higher accuracy compared to relying solely on the answer derived from the original complete trace.
arXiv Detail & Related papers (2025-04-29T12:39:07Z)
Right Answer, Wrong Score: Uncovering the Inconsistencies of LLM Evaluation in Multiple-Choice Question Answering [78.89231943329885]
One of the most widely used tasks to evaluate Large Language Models (LLMs) is Multiple-Choice Question Answering (MCQA) In this work, we shed light on the inconsistencies of MCQA evaluation strategies, which can lead to inaccurate and misleading model comparisons.
arXiv Detail & Related papers (2025-03-19T08:45:03Z)
Toward Adaptive Reasoning in Large Language Models with Thought Rollback [33.714789952452094]
This paper proposes a new reasoning framework, called Thought Rollback (TR) TR allows large language models (LLMs) to adaptively build thought structure while maintaining effective reasoning toward problem-solving under hallucinations''
arXiv Detail & Related papers (2024-12-27T16:02:34Z)
Sequence to Sequence Reward Modeling: Improving RLHF by Language Feedback [8.601283886845664]
Reinforcement learning from human feedback (RLHF) aligns Large language models (LLMs) with human intentions and values. Despite its effectiveness and popularity, RLHF is prone to biased local optimization. We propose a novel textitsequence-to-sequence (seq2seq) reward modeling method.
arXiv Detail & Related papers (2024-08-30T16:14:35Z)
FSM: A Finite State Machine Based Zero-Shot Prompting Paradigm for Multi-Hop Question Answering [26.398873686905063]
Large Language Models (LLMs) with chain-of-thought (COT) prompting have demonstrated impressive abilities on simple nature language inference tasks. We propose a prompting method, Finite State Machine (FSM) to enhance the reasoning capabilities of LLM for complex tasks.
arXiv Detail & Related papers (2024-07-03T10:01:01Z)
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models [84.15513004135576]
Current research enhances the reasoning performance of Large Language Models (LLMs) by sampling multiple reasoning chains and ensembling based on the answer frequency. This approach fails in scenarios where the correct answers are in the minority. We introduce a hierarchical reasoning aggregation framework AoR, which selects answers based on the evaluation of reasoning chains.
arXiv Detail & Related papers (2024-05-21T17:12:19Z)
Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses [9.956253757863145]
We propose Re-Ex, a method for post-editing large language models (LLMs)-generated responses. Re-Ex introduces a novel reasoning step dubbed as the factual error explanation step. In addition to the explanation step, Re-Ex also incorporates new prompting techniques to reduce the token count and inference time required for the response revision process.
arXiv Detail & Related papers (2024-02-27T00:22:18Z)
Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [57.974103113675795]
We present a method named Rephrase and Respond' (RaR) which allows Large Language Models to rephrase and expand questions posed by humans. RaR serves as a simple yet effective prompting method for improving performance. We show that RaR is complementary to the popular Chain-of-Thought (CoT) methods, both theoretically and empirically.
arXiv Detail & Related papers (2023-11-07T18:43:34Z)
Re-Reading Improves Reasoning in Large Language Models [87.46256176508376]
We introduce a simple, yet general and effective prompting method, Re2, to enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs) Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), Re2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process. We evaluate Re2 on extensive reasoning benchmarks across 14 datasets, spanning 112 experiments, to validate its effectiveness and generality.
arXiv Detail & Related papers (2023-09-12T14:36:23Z)
RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought [56.558892336235914]
Reversing Chain-of-Thought (RCoT) is a novel method to improve large language models' reasoning abilities. RCoT automatically detects and rectifys factual inconsistency in generated solutions. We show that manually written fine-grained feedback can dramatically improve LLMs' reasoning abilities.
arXiv Detail & Related papers (2023-05-19T08:02:52Z)
Answering Questions by Meta-Reasoning over Multiple Chains of Thought [53.55653437903948]
We introduce Multi-Chain Reasoning (MCR), an approach which prompts large language models to meta-reason over multiple chains of thought. MCR examines different reasoning chains, mixes information between them and selects the most relevant facts in generating an explanation and predicting the answer.
arXiv Detail & Related papers (2023-04-25T17:27:37Z)
Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models [81.01397924280612]
Large language models (LLMs) can achieve highly effective performance on various reasoning tasks by incorporating step-by-step chain-of-thought (CoT) prompting as demonstrations. We introduce Iter-CoT (Iterative bootstrapping in Chain-of-Thoughts Prompting), an iterative bootstrapping approach for selecting exemplars and generating reasoning chains.
arXiv Detail & Related papers (2023-04-23T13:54:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.