Evidence to Generate (E2G): A Single-agent Two-step Prompting for
Context Grounded and Retrieval Augmented Reasoning
- URL: http://arxiv.org/abs/2401.05787v1
- Date: Thu, 11 Jan 2024 09:49:15 GMT
- Title: Evidence to Generate (E2G): A Single-agent Two-step Prompting for
Context Grounded and Retrieval Augmented Reasoning
- Authors: Md Rizwan Parvez
- Abstract summary: We introduce Evidence to Generate (E2G), a novel single-agent, two-step prompting framework.
Instead of unverified reasoning claims, E2G focuses exclusively on the thought sequences explicitly mentioned in the context.
tool achieves remarkable results robustly across a wide range of knowledge-intensive reasoning and generation tasks.
- Score: 3.117335706912261
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While chain-of-thought (CoT) prompting has revolutionized how LLMs perform
reasoning tasks, its current methods and variations (e.g, Self-consistency,
ReACT, Reflexion, Tree-of-Thoughts (ToT), Cumulative Reasoning (CR)) suffer
from limitations like slowness, limited context grounding, hallucination and
inconsistent outputs. To overcome these challenges, we introduce Evidence to
Generate (E2G), a novel single-agent, two-step prompting framework. Instead of
unverified reasoning claims, this innovative approach leverages the power of
"evidence for decision making" by first focusing exclusively on the thought
sequences (the series of intermediate steps) explicitly mentioned in the
context which then serve as extracted evidence, guiding the LLM's output
generation process with greater precision and efficiency. This simple yet
powerful approach unlocks the true potential of chain-of-thought like
prompting, paving the way for faster, more reliable, and more contextually
aware reasoning in LLMs. \tool achieves remarkable results robustly across a
wide range of knowledge-intensive reasoning and generation tasks, surpassing
baseline approaches with state-of-the-art LLMs. For example, (i) on LogiQA
benchmark using GPT-4 as backbone model, \tool achieves a new state-of-the
Accuracy of 53.8% exceeding CoT by 18%, ToT by 11%, CR by 9% (ii) a variant of
E2G with PaLM2 outperforms the variable-shot performance of Gemini Ultra by 0.9
F1 points, reaching an F1 score of 83.3 on a subset of DROP.
Related papers
- An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking [50.81324768683995]
FIRST is a novel approach that integrates a learning-to-rank objective and leveraging the logits of only the first generated token.
We extend the evaluation of FIRST to the TREC Deep Learning datasets (DL19-22), validating its robustness across diverse domains.
Our experiments confirm that fast reranking with single-token logits does not compromise out-of-domain reranking quality.
arXiv Detail & Related papers (2024-11-08T12:08:17Z) - Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding [74.31981011985681]
Large language models (LLMs) have shown impressive capabilities, but still struggle with complex reasoning tasks requiring multiple steps.
We introduce LaTent Reasoning Optimization (LaTRO), a principled framework that formulates reasoning as sampling from a latent distribution.
We validate LaTRO through experiments on GSM8K and ARC-Challenge datasets using multiple model architectures.
arXiv Detail & Related papers (2024-11-06T22:02:30Z) - Can Language Models Perform Robust Reasoning in Chain-of-thought Prompting with Noisy Rationales? [19.13886382791074]
This paper investigates an under-explored challenge in large language models (LLMs): chain-of-thought prompting with noisy rationales.
We construct NoRa dataset that is tailored to evaluate the robustness of reasoning in the presence of noisy rationales.
We propose the method of contrastive denoising with noisy chain-of-thought (CD-CoT)
arXiv Detail & Related papers (2024-10-31T12:07:44Z) - Textualized Agent-Style Reasoning for Complex Tasks by Multiple Round LLM Generation [49.27250832754313]
We present AgentCOT, a llm-based autonomous agent framework.
At each step, AgentCOT selects an action and executes it to yield an intermediate result with supporting evidence.
We introduce two new strategies to enhance the performance of AgentCOT.
arXiv Detail & Related papers (2024-09-19T02:20:06Z) - Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation [16.350747493026432]
The Chain-of-Thought (CoT) paradigm has emerged as a critical approach for enhancing the reasoning capabilities of large language models (LLMs)
We propose the textbfStrategic Chain-of-Thought (SCoT) to refine LLM performance by integrating strategic knowledge prior to generating intermediate reasoning steps.
SCoT employs a two-stage approach within a single prompt: first eliciting an effective problem-solving strategy, which is then used to guide the generation of high-quality CoT paths and final answers.
arXiv Detail & Related papers (2024-09-05T06:28:05Z) - Beyond Numeric Awards: In-Context Dueling Bandits with LLM Agents [25.825941077332182]
This paper is the first to investigate the performance of Large Language Models (LLMs) as decision-makers in the context of Dueling Bandits (DB)
Our results reveal that LLMs, particularly GPT-4 Turbo, quickly identify the Condorcet winner, thus outperforming existing state-of-the-art algorithms in terms of weak regret.
To overcome these issues, we introduce a hybrid algorithm: LLM-Enhanced Adaptive Dueling (LEAD), which takes advantage of both in-context decision-making capabilities of LLMs and theoretical guarantees inherited from classic DB algorithms.
arXiv Detail & Related papers (2024-07-02T02:18:14Z) - MR-Ben: A Meta-Reasoning Benchmark for Evaluating System-2 Thinking in LLMs [55.20845457594977]
Large language models (LLMs) have shown increasing capability in problem-solving and decision-making.
We present a process-based benchmark MR-Ben that demands a meta-reasoning skill.
Our meta-reasoning paradigm is especially suited for system-2 slow thinking.
arXiv Detail & Related papers (2024-06-20T03:50:23Z) - DQ-LoRe: Dual Queries with Low Rank Approximation Re-ranking for
In-Context Learning [66.85379279041128]
In this study, we introduce a framework that leverages Dual Queries and Low-rank approximation Re-ranking to automatically select exemplars for in-context learning.
DQ-LoRe significantly outperforms prior state-of-the-art methods in the automatic selection of exemplars for GPT-4, enhancing performance from 92.5% to 94.2%.
arXiv Detail & Related papers (2023-10-04T16:44:37Z) - Re-Reading Improves Reasoning in Large Language Models [87.46256176508376]
We introduce a simple, yet general and effective prompting method, Re2, to enhance the reasoning capabilities of off-the-shelf Large Language Models (LLMs)
Unlike most thought-eliciting prompting methods, such as Chain-of-Thought (CoT), Re2 shifts the focus to the input by processing questions twice, thereby enhancing the understanding process.
We evaluate Re2 on extensive reasoning benchmarks across 14 datasets, spanning 112 experiments, to validate its effectiveness and generality.
arXiv Detail & Related papers (2023-09-12T14:36:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.