R$^3$ Prompting: Review, Rephrase and Resolve for Chain-of-Thought
Reasoning in Large Language Models under Noisy Context
- URL: http://arxiv.org/abs/2310.16535v1
- Date: Wed, 25 Oct 2023 10:34:02 GMT
- Title: R$^3$ Prompting: Review, Rephrase and Resolve for Chain-of-Thought
Reasoning in Large Language Models under Noisy Context
- Authors: Qingyuan Tian, Hanlun Zhu, Lei Wang, Yang Li, Yunshi Lan
- Abstract summary: We propose a novel prompting method, namely R$3$ prompting, for Chain-of-Thought (CoT) reasoning under noisy context.
Our experiments show that R$3$ prompting significantly outperforms existing CoT prompting methods on five reasoning tasks under noisy context.
- Score: 12.475979274233458
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the help of Chain-of-Thought (CoT) prompting, Large Language Models
(LLMs) have achieved remarkable performance on various reasoning tasks.
However, most of them have been evaluated under noise-free context and the
dilemma for LLMs to produce inaccurate results under the noisy context has not
been fully investigated. Existing studies utilize trigger sentences to
encourage LLMs to concentrate on the relevant information but the trigger has
limited effect on final answer prediction. Inspired by interactive CoT method,
where intermediate reasoning steps are promoted by multiple rounds of
interaction between users and LLMs, we propose a novel prompting method, namely
R$^3$ prompting, for CoT reasoning under noisy context. Specifically, R$^3$
prompting interacts with LLMs to perform key sentence extraction, variable
declaration and answer prediction, which corresponds to a thought process of
reviewing, rephrasing and resolving. The responses generated at the last
interaction will perform as hints to guide toward the responses of the next
interaction. Our experiments show that R$^3$ prompting significantly
outperforms existing CoT prompting methods on five reasoning tasks under noisy
context. With GPT-3.5-turbo, we observe 3.7% accuracy improvement on average on
the reasoning tasks under noisy context compared to the most competitive
prompting baseline. More analyses and ablation studies show the robustness and
generalization of R$^3$ prompting method in solving reasoning tasks in LLMs
under noisy context.
Related papers
- FLARE: Faithful Logic-Aided Reasoning and Exploration [50.9814063216852]
We introduce a novel approach for traversing the problem space using task decompositions.
We use the Large Language Models to plan a solution, soft-formalise the query into facts and predicates using a logic programming code.
Our method allows us to compute the faithfulness of the reasoning process w.r.t. the generated code and analyse the steps of the multi-hop search without relying on external solvers.
arXiv Detail & Related papers (2024-10-14T19:39:11Z) - Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning [11.758019716526459]
Chain-of-Thought (CoT) prompting has been shown to enhance the multi-step reasoning capabilities of Large Language Models (LLMs)
We show that CoT prompting performance reflects both memorization and a probabilistic version of genuine reasoning.
arXiv Detail & Related papers (2024-07-01T18:01:07Z) - Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models [84.15513004135576]
Current research enhances the reasoning performance of Large Language Models (LLMs) by sampling multiple reasoning chains and ensembling based on the answer frequency.
This approach fails in scenarios where the correct answers are in the minority.
We introduce a hierarchical reasoning aggregation framework AoR, which selects answers based on the evaluation of reasoning chains.
arXiv Detail & Related papers (2024-05-21T17:12:19Z) - Can large language models explore in-context? [87.49311128190143]
We deploy Large Language Models as agents in simple multi-armed bandit environments.
We find that the models do not robustly engage in exploration without substantial interventions.
arXiv Detail & Related papers (2024-03-22T17:50:43Z) - Generating Chain-of-Thoughts with a Pairwise-Comparison Approach to Searching for the Most Promising Intermediate Thought [70.30423016640749]
Chain-of-thoughts (CoT) methods were proposed to guide large language models to reason step-by-step, enabling problem solving from simple to complex.
The evaluation from the large language model (LLMs) is typically noisy and unreliable, potentially misleading the generation process in selecting promising intermediate thoughts.
In this paper, motivated by Vapnik's principle, we use pairwise-comparison evaluation instead of point-wise scoring to search for promising intermediate thoughts.
arXiv Detail & Related papers (2024-02-10T09:51:03Z) - Graph Elicitation for Guiding Multi-Step Reasoning in Large Language Models [16.432208223793666]
Chain-of-Thought prompting along with sub-question generation and answering has enhanced multi-step reasoning capabilities.
We propose a GE-Reasoning method, which directs Large Language Models to generate proper sub-questions and corresponding answers.
Our approach outperforms previous CoT prompting methods and their variants on multi-hop question answering benchmark datasets.
arXiv Detail & Related papers (2023-11-16T10:36:08Z) - LINC: A Neurosymbolic Approach for Logical Reasoning by Combining
Language Models with First-Order Logic Provers [60.009969929857704]
Logical reasoning is an important task for artificial intelligence with potential impacts on science, mathematics, and society.
In this work, we reformulating such tasks as modular neurosymbolic programming, which we call LINC.
We observe significant performance gains on FOLIO and a balanced subset of ProofWriter for three different models in nearly all experimental conditions we evaluate.
arXiv Detail & Related papers (2023-10-23T17:58:40Z) - Revisiting Large Language Models as Zero-shot Relation Extractors [8.953462875381888]
Relation extraction (RE) consistently involves a certain degree of labeled or unlabeled data even if under zero-shot setting.
Recent studies have shown that large language models (LLMs) transfer well to new tasks out-of-the-box simply given a natural language prompt.
This work focuses on the study of exploring LLMs as zero-shot relation extractors.
arXiv Detail & Related papers (2023-10-08T06:17:39Z) - Learning to Ask Conversational Questions by Optimizing Levenshtein
Distance [83.53855889592734]
We introduce a Reinforcement Iterative Sequence Editing (RISE) framework that optimize the minimum Levenshtein distance (MLD) through explicit editing actions.
RISE is able to pay attention to tokens that are related to conversational characteristics.
Experimental results on two benchmark datasets show that RISE significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-06-30T08:44:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.