A Step Closer to Comprehensive Answers: Constrained Multi-Stage Question
Decomposition with Large Language Models
- URL: http://arxiv.org/abs/2311.07491v1
- Date: Mon, 13 Nov 2023 17:28:03 GMT
- Title: A Step Closer to Comprehensive Answers: Constrained Multi-Stage Question
Decomposition with Large Language Models
- Authors: Hejing Cao and Zhenwei An and Jiazhan Feng and Kun Xu and Liwei Chen
and Dongyan Zhao
- Abstract summary: We introduce the "Decompose-and-Query" framework (D&Q)
This framework guides the model to think and utilize external knowledge similar to ReAct.
On our ChitChatQA dataset, D&Q does not lose to ChatGPT in 67% of cases.
- Score: 43.10340493000934
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While large language models exhibit remarkable performance in the Question
Answering task, they are susceptible to hallucinations. Challenges arise when
these models grapple with understanding multi-hop relations in complex
questions or lack the necessary knowledge for a comprehensive response. To
address this issue, we introduce the "Decompose-and-Query" framework (D&Q).
This framework guides the model to think and utilize external knowledge similar
to ReAct, while also restricting its thinking to reliable information,
effectively mitigating the risk of hallucinations. Experiments confirm the
effectiveness of D&Q: On our ChitChatQA dataset, D&Q does not lose to ChatGPT
in 67% of cases; on the HotPotQA question-only setting, D&Q achieved an F1
score of 59.6%. Our code is available at
https://github.com/alkaidpku/DQ-ToolQA.
Related papers
- Researchy Questions: A Dataset of Multi-Perspective, Decompositional
Questions for LLM Web Agents [22.023543164141504]
We present Researchy Questions, a dataset of search engine queries tediously filtered to be non-factoid, decompositional'' and multi-perspective.
We show that users spend a lot of effort'' on these questions in terms of signals like clicks and session length.
We also show that slow thinking'' answering techniques, like decomposition into sub-questions shows benefit over answering directly.
arXiv Detail & Related papers (2024-02-27T21:27:16Z) - GenDec: A robust generative Question-decomposition method for Multi-hop
reasoning [32.12904215053187]
Multi-hop QA involves step-by-step reasoning to answer complex questions.
Existing large language models'(LLMs) reasoning ability in multi-hop question answering remains exploration.
It is unclear whether LLMs follow a desired reasoning chain to reach the right final answer.
arXiv Detail & Related papers (2024-02-17T02:21:44Z) - AGent: A Novel Pipeline for Automatically Creating Unanswerable
Questions [10.272000561545331]
We propose AGent, a novel pipeline that creates new unanswerable questions by re-matching a question with a context that lacks the necessary information for a correct answer.
In this paper, we demonstrate the usefulness of this AGent pipeline by creating two sets of unanswerable questions from answerable questions in SQuAD and HotpotQA.
arXiv Detail & Related papers (2023-09-10T18:13:11Z) - RoMQA: A Benchmark for Robust, Multi-evidence, Multi-answer Question
Answering [87.18962441714976]
We introduce RoMQA, the first benchmark for robust, multi-evidence, multi-answer question answering (QA)
We evaluate state-of-the-art large language models in zero-shot, few-shot, and fine-tuning settings, and find that RoMQA is challenging.
Our results show that RoMQA is a challenging benchmark for large language models, and provides a quantifiable test to build more robust QA methods.
arXiv Detail & Related papers (2022-10-25T21:39:36Z) - "John is 50 years old, can his son be 65?" Evaluating NLP Models'
Understanding of Feasibility [19.47954905054217]
This work focuses on a simple commonsense ability, reasoning about when an action (or its effect) is feasible.
We show that even state-of-the-art models such as GPT-3 struggle to answer the feasibility questions correctly.
arXiv Detail & Related papers (2022-10-14T02:46:06Z) - Learn to Explain: Multimodal Reasoning via Thought Chains for Science
Question Answering [124.16250115608604]
We present Science Question Answering (SQA), a new benchmark that consists of 21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations.
We show that SQA improves the question answering performance by 1.20% in few-shot GPT-3 and 3.99% in fine-tuned UnifiedQA.
Our analysis further shows that language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40% of the data.
arXiv Detail & Related papers (2022-09-20T07:04:24Z) - QA4QG: Using Question Answering to Constrain Multi-Hop Question
Generation [54.136509061542775]
Multi-hop question generation (MQG) aims to generate complex questions which require reasoning over multiple pieces of information of the input passage.
We propose a novel framework, QA4QG, a QA-augmented BART-based framework for MQG.
Our results on the HotpotQA dataset show that QA4QG outperforms all state-of-the-art models.
arXiv Detail & Related papers (2022-02-14T08:16:47Z) - Learn to Resolve Conversational Dependency: A Consistency Training
Framework for Conversational Question Answering [14.382513103948897]
We propose ExCorD (Explicit guidance on how to resolve Conversational Dependency) to enhance the abilities of QA models in comprehending conversational context.
In our experiments, we demonstrate that ExCorD significantly improves the QA models' performance by up to 1.2 F1 on QuAC, and 5.2 F1 on CANARD.
arXiv Detail & Related papers (2021-06-22T07:16:45Z) - Multi-hop Question Generation with Graph Convolutional Network [58.31752179830959]
Multi-hop Question Generation (QG) aims to generate answer-related questions by aggregating and reasoning over multiple scattered evidence from different paragraphs.
We propose Multi-Hop volution Fusion Network for Question Generation (MulQG), which does context encoding in multiple hops.
Our proposed model is able to generate fluent questions with high completeness and outperforms the strongest baseline by 20.8% in the multi-hop evaluation.
arXiv Detail & Related papers (2020-10-19T06:15:36Z) - SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions [66.86887670416193]
We show that state-of-the-art VQA models have comparable performance in answering perception and reasoning questions, but suffer from consistency problems.
To address this shortcoming, we propose an approach called Sub-Question-aware Network Tuning (SQuINT)
We show that SQuINT improves model consistency by 5%, also marginally improving performance on the Reasoning questions in VQA, while also displaying better attention maps.
arXiv Detail & Related papers (2020-01-20T01:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.