Related papers: GenSco: Can Question Decomposition based Passage Alignment improve Question Answering?

GenSco: Can Question Decomposition based Passage Alignment improve Question Answering?

URL: http://arxiv.org/abs/2407.10245v1
Date: Sun, 14 Jul 2024 15:25:08 GMT
Title: GenSco: Can Question Decomposition based Passage Alignment improve Question Answering?
Authors: Barah Fazili, Koustava Goswami, Natwar Modani, Inderjeet Nair,
Abstract summary: "GenSco" is a novel approach of selecting passages based on the predicted decomposition of the multi-hop questions. We evaluate on three broadly established multi-hop question answering datasets.
Score: 1.5776201492893507
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Retrieval augmented generation (RAG) with large language models (LLMs) for Question Answering (QA) entails furnishing relevant context within the prompt to facilitate the LLM in answer generation. During the generation, inaccuracies or hallucinations frequently occur due to two primary factors: inadequate or distracting context in the prompts, and the inability of LLMs to effectively reason through the facts. In this paper, we investigate whether providing aligned context via a carefully selected passage sequence leads to better answer generation by the LLM for multi-hop QA. We introduce, "GenSco", a novel approach of selecting passages based on the predicted decomposition of the multi-hop questions}. The framework consists of two distinct LLMs: (i) Generator LLM, which is used for question decomposition and final answer generation; (ii) an auxiliary open-sourced LLM, used as the scorer, to semantically guide the Generator for passage selection. The generator is invoked only once for the answer generation, resulting in a cost-effective and efficient approach. We evaluate on three broadly established multi-hop question answering datasets: 2WikiMultiHop, Adversarial HotPotQA and MuSiQue and achieve an absolute gain of $15.1$ and $5.9$ points in Exact Match score with respect to the best performing baselines over MuSiQue and 2WikiMultiHop respectively.

Related papers

ELOQ: Resources for Enhancing LLM Detection of Out-of-Scope Questions [52.33835101586687]
Large Language Models (LLMs) are widely used in Conversational AI systems to generate responses to user inquiries. We propose a guided hallucination-based method to efficiently generate a diverse set of out-of-scope questions from a given document corpus.
arXiv Detail & Related papers (2024-10-18T16:11:29Z)
Learning When to Retrieve, What to Rewrite, and How to Respond in Conversational QA [16.1357049130957]
We build on the single-turn SELF-RAG framework and propose SELF-multi-RAG for conversational settings. SELF-multi-RAG demonstrates improved capabilities over single-turn variants with respect to retrieving relevant passages.
arXiv Detail & Related papers (2024-09-23T20:05:12Z)
Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering [45.82437926569949]
Multi-Hop Question Answering tasks present a significant challenge for large language models. We introduce a novel generate-then-ground (GenGround) framework to solve a multi-hop question.
arXiv Detail & Related papers (2024-06-21T06:26:38Z)
Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs [52.42505579545893]
Large language models (LLMs) demonstrate strong reasoning abilities when prompted to generate chain-of-thought explanations alongside answers. We propose a novel discriminative and generative CoT evaluation paradigm to assess LLMs' knowledge of reasoning and the accuracy of the generated CoT.
arXiv Detail & Related papers (2024-02-17T05:22:56Z)
GenDec: A robust generative Question-decomposition method for Multi-hop reasoning [32.12904215053187]
Multi-hop QA involves step-by-step reasoning to answer complex questions. Existing large language models'(LLMs) reasoning ability in multi-hop question answering remains exploration. It is unclear whether LLMs follow a desired reasoning chain to reach the right final answer.
arXiv Detail & Related papers (2024-02-17T02:21:44Z)
Graph Elicitation for Guiding Multi-Step Reasoning in Large Language Models [16.432208223793666]
Chain-of-Thought prompting along with sub-question generation and answering has enhanced multi-step reasoning capabilities. We propose a GE-Reasoning method, which directs Large Language Models to generate proper sub-questions and corresponding answers. Our approach outperforms previous CoT prompting methods and their variants on multi-hop question answering benchmark datasets.
arXiv Detail & Related papers (2023-11-16T10:36:08Z)
Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves [57.974103113675795]
We present a method named Rephrase and Respond' (RaR) which allows Large Language Models to rephrase and expand questions posed by humans. RaR serves as a simple yet effective prompting method for improving performance. We show that RaR is complementary to the popular Chain-of-Thought (CoT) methods, both theoretically and empirically.
arXiv Detail & Related papers (2023-11-07T18:43:34Z)
Improving Question Generation with Multi-level Content Planning [70.37285816596527]
This paper addresses the problem of generating questions from a given context and an answer, specifically focusing on questions that require multi-hop reasoning across an extended context. We propose MultiFactor, a novel QG framework based on multi-level content planning. Specifically, MultiFactor includes two components: FA-model, which simultaneously selects key phrases and generates full answers, and Q-model which takes the generated full answer as an additional input to generate questions.
arXiv Detail & Related papers (2023-10-20T13:57:01Z)
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation [92.43001160060376]
We study the factuality of large language models (LLMs) in the context of answering questions that test current world knowledge. We introduce FreshQA, a novel dynamic QA benchmark encompassing a diverse range of question and answer types. We benchmark a diverse array of both closed and open-source LLMs under a two-mode evaluation procedure that allows us to measure both correctness and hallucination. Motivated by these results, we present FreshPrompt, a simple few-shot prompting method that substantially boosts the performance of an LLM on FreshQA.
arXiv Detail & Related papers (2023-10-05T00:04:12Z)
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback [127.75419038610455]
Large language models (LLMs) are able to generate human-like, fluent responses for many downstream tasks. This paper proposes a LLM-Augmenter system, which augments a black-box LLM with a set of plug-and-play modules.
arXiv Detail & Related papers (2023-02-24T18:48:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.