Mastering the ABCDs of Complex Questions: Answer-Based Claim
Decomposition for Fine-grained Self-Evaluation
- URL: http://arxiv.org/abs/2305.14750v1
- Date: Wed, 24 May 2023 05:53:11 GMT
- Title: Mastering the ABCDs of Complex Questions: Answer-Based Claim
Decomposition for Fine-grained Self-Evaluation
- Authors: Nishant Balepur, Jie Huang, Samraj Moorjani, Hari Sundaram, Kevin
Chen-Chuan Chang
- Abstract summary: We propose answer-based claim decomposition (ABCD), a prompting strategy that decomposes questions into true/false claims.
Using the decomposed ABCD claims, we perform fine-grained self-evaluation.
We find that GPT-3.5 has some ability to determine to what extent its answer satisfies the criteria of the input question.
- Score: 9.776667356119352
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When answering complex questions, large language models (LLMs) may produce
answers that do not satisfy all criteria of the question. While existing
self-evaluation techniques aim to detect if such answers are correct, these
techniques are unable to determine which criteria of the question are satisfied
by the generated answers. To address this issue, we propose answer-based claim
decomposition (ABCD), a prompting strategy that decomposes questions into a
series of true/false claims that can be used to verify which criteria of the
input question an answer satisfies. Using the decomposed ABCD claims, we
perform fine-grained self-evaluation. Through preliminary experiments on three
datasets, including a newly-collected challenge dataset ObscureQA, we find that
GPT-3.5 has some ability to determine to what extent its answer satisfies the
criteria of the input question, and can give insights into the errors and
knowledge gaps of the model.
Related papers
- Don't Just Say "I don't know"! Self-aligning Large Language Models for Responding to Unknown Questions with Explanations [70.6395572287422]
Self-alignment method is capable of not only refusing to answer but also providing explanation to the unanswerability of unknown questions.
We conduct disparity-driven self-curation to select qualified data for fine-tuning the LLM itself for aligning the responses to unknown questions as desired.
arXiv Detail & Related papers (2024-02-23T02:24:36Z) - Alexpaca: Learning Factual Clarification Question Generation Without Examples [19.663171923249283]
We present a new task that focuses on the ability to elicit missing information in multi-hop reasoning tasks.
Humans outperform GPT-4 by a large margin, while Llama 3 8B Instruct does not even beat the dummy baseline in some metrics.
arXiv Detail & Related papers (2023-10-17T20:40:59Z) - Open-Set Knowledge-Based Visual Question Answering with Inference Paths [79.55742631375063]
The purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases.
We propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity)
Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process.
arXiv Detail & Related papers (2023-10-12T09:12:50Z) - Asking Clarification Questions to Handle Ambiguity in Open-Domain QA [25.80369529145732]
We propose to ask a clarification question, where the user's response will help identify the interpretation that best aligns with the user's intention.
We first present CAMBIGNQ, a dataset consisting of 5,654 ambiguous questions.
We then define a pipeline of tasks and design appropriate evaluation metrics.
arXiv Detail & Related papers (2023-05-23T08:20:01Z) - Do I have the Knowledge to Answer? Investigating Answerability of
Knowledge Base Questions [25.13991044303459]
We create GrailQAbility, a new benchmark KBQA dataset with unanswerability.
Experimenting with three state-of-the-art KBQA models, we find that all three models suffer a drop in performance.
This underscores the need for further research in making KBQA systems robust to unanswerability.
arXiv Detail & Related papers (2022-12-20T12:00:26Z) - CREPE: Open-Domain Question Answering with False Presuppositions [92.20501870319765]
We introduce CREPE, a QA dataset containing a natural distribution of presupposition failures from online information-seeking forums.
We find that 25% of questions contain false presuppositions, and provide annotations for these presuppositions and their corrections.
We show that adaptations of existing open-domain QA models can find presuppositions moderately well, but struggle when predicting whether a presupposition is factually correct.
arXiv Detail & Related papers (2022-11-30T18:54:49Z) - Double Retrieval and Ranking for Accurate Question Answering [120.69820139008138]
We show that an answer verification step introduced in Transformer-based answer selection models can significantly improve the state of the art in Question Answering.
The results on three well-known datasets for AS2 show consistent and significant improvement of the state of the art.
arXiv Detail & Related papers (2022-01-16T06:20:07Z) - ConditionalQA: A Complex Reading Comprehension Dataset with Conditional
Answers [93.55268936974971]
We describe a Question Answering dataset that contains complex questions with conditional answers.
We call this dataset ConditionalQA.
We show that ConditionalQA is challenging for many of the existing QA models, especially in selecting answer conditions.
arXiv Detail & Related papers (2021-10-13T17:16:46Z) - GooAQ: Open Question Answering with Diverse Answer Types [63.06454855313667]
We present GooAQ, a large-scale dataset with a variety of answer types.
This dataset contains over 5 million questions and 3 million answers collected from Google.
arXiv Detail & Related papers (2021-04-18T05:40:39Z) - Determining Question-Answer Plausibility in Crowdsourced Datasets Using
Multi-Task Learning [10.742152224470317]
We propose a novel task for automated quality analysis and data cleaning: question-answer (QA) plausibility.
Given a machine or user-generated question and a crowd-sourced response from a social media user, we determine if the question and response are valid.
We evaluate the ability of our models to generate a clean, usable question-answer dataset.
arXiv Detail & Related papers (2020-11-10T04:11:44Z) - A Wrong Answer or a Wrong Question? An Intricate Relationship between
Question Reformulation and Answer Selection in Conversational Question
Answering [15.355557454305776]
We show that question rewriting (QR) of the conversational context allows to shed more light on this phenomenon.
We present the results of this analysis on the TREC CAsT and QuAC (CANARD) datasets.
arXiv Detail & Related papers (2020-10-13T06:29:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.