Won't Get Fooled Again: Answering Questions with False Premises
- URL: http://arxiv.org/abs/2307.02394v1
- Date: Wed, 5 Jul 2023 16:09:21 GMT
- Title: Won't Get Fooled Again: Answering Questions with False Premises
- Authors: Shengding Hu, Yifan Luo, Huadong Wang, Xingyi Cheng, Zhiyuan Liu,
Maosong Sun
- Abstract summary: Pre-trained language models (PLMs) have shown unprecedented potential in various fields.
PLMs tend to be easily deceived by tricky questions such as "How many eyes does the sun have?"
We find that the PLMs already possess the knowledge required to rebut such questions.
- Score: 79.8761549830075
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pre-trained language models (PLMs) have shown unprecedented potential in
various fields, especially as the backbones for question-answering (QA)
systems. However, they tend to be easily deceived by tricky questions such as
"How many eyes does the sun have?". Such frailties of PLMs often allude to the
lack of knowledge within them. In this paper, we find that the PLMs already
possess the knowledge required to rebut such questions, and the key is how to
activate the knowledge. To systematize this observation, we investigate the
PLMs' responses to one kind of tricky questions, i.e., the false premises
questions (FPQs). We annotate a FalseQA dataset containing 2365 human-written
FPQs, with the corresponding explanations for the false premises and the
revised true premise questions. Using FalseQA, we discover that PLMs are
capable of discriminating FPQs by fine-tuning on moderate numbers (e.g., 256)
of examples. PLMs also generate reasonable explanations for the false premise,
which serve as rebuttals. Further replaying a few general questions during
training allows PLMs to excel on FPQs and general questions simultaneously. Our
work suggests that once the rebuttal ability is stimulated, knowledge inside
the PLMs can be effectively utilized to handle FPQs, which incentivizes the
research on PLM-based QA systems.
Related papers
- Reverse Question Answering: Can an LLM Write a Question so Hard (or Bad) that it Can't Answer? [24.614521528699093]
Past work tests QA and RQA separately, but we test them jointly, comparing their difficulty, aiding benchmark design, and assessing reasoning consistency.
16 LLMs run QA and RQA with trivia questions/answers, showing: 1) Versus QA, LLMs are much less accurate in RQA for numerical answers, but slightly more accurate in RQA for textual answers.
arXiv Detail & Related papers (2024-10-20T21:17:49Z) - Utilize the Flow before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning [68.57166425493283]
Refusal-Aware Instruction Tuning (RAIT) enables Large Language Models (LLMs) to refuse to answer unknown questions.
RAIT modifies training samples based on the correctness of the initial LLM's response.
This crude approach can cause LLMs to excessively refuse answering questions they could have correctly answered.
arXiv Detail & Related papers (2024-10-09T14:12:51Z) - Are LLMs Aware that Some Questions are not Open-ended? [58.93124686141781]
We study whether Large Language Models are aware that some questions have limited answers and need to respond more deterministically.
The lack of question awareness in LLMs leads to two phenomena: (1) too casual to answer non-open-ended questions or (2) too boring to answer open-ended questions.
arXiv Detail & Related papers (2024-10-01T06:07:00Z) - KG-FPQ: Evaluating Factuality Hallucination in LLMs with Knowledge Graph-based False Premise Questions [19.246385485678104]
Large language models (LLMs) are susceptible to being misled by false premise questions (FPQs)
We introduce an automated, scalable pipeline to create FPQs based on knowledge graphs (KGs)
We present a benchmark, the Knowledge Graph-based False Premise Questions (KG-FPQ), which contains approximately 178k FPQs across three knowledge domains, at six levels of confusability, and in two task formats.
arXiv Detail & Related papers (2024-07-08T12:31:03Z) - FreshLLMs: Refreshing Large Language Models with Search Engine
Augmentation [92.43001160060376]
We study the factuality of large language models (LLMs) in the context of answering questions that test current world knowledge.
We introduce FreshQA, a novel dynamic QA benchmark encompassing a diverse range of question and answer types.
We benchmark a diverse array of both closed and open-source LLMs under a two-mode evaluation procedure that allows us to measure both correctness and hallucination.
Motivated by these results, we present FreshPrompt, a simple few-shot prompting method that substantially boosts the performance of an LLM on FreshQA.
arXiv Detail & Related papers (2023-10-05T00:04:12Z) - TSGP: Two-Stage Generative Prompting for Unsupervised Commonsense
Question Answering [4.965306353273393]
Unsupervised commonsense question answering requires mining effective commonsense knowledge without the rely on the labeled task data.
We propose a two-stage prompt-based unsupervised commonsense question answering framework (TSGP)
Experimental results and analysis on three different commonsense reasoning tasks, CommonsenseQA, OpenBookQA, and SocialIQA, demonstrate that TSGP significantly improves the reasoning ability of language models in unsupervised settings.
arXiv Detail & Related papers (2022-11-24T10:19:24Z) - Multifaceted Improvements for Conversational Open-Domain Question
Answering [54.913313912927045]
We propose a framework with Multifaceted Improvements for Conversational open-domain Question Answering (MICQA)
Firstly, the proposed KL-divergence based regularization is able to lead to a better question understanding for retrieval and answer reading.
Second, the added post-ranker module can push more relevant passages to the top placements and be selected for reader with a two-aspect constrains.
Third, the well designed curriculum learning strategy effectively narrows the gap between the golden passage settings of training and inference, and encourages the reader to find true answer without the golden passage assistance.
arXiv Detail & Related papers (2022-04-01T07:54:27Z) - Unsupervised Question Decomposition for Question Answering [102.56966847404287]
We propose an algorithm for One-to-N Unsupervised Sequence Sequence (ONUS) that learns to map one hard, multi-hop question to many simpler, single-hop sub-questions.
We show large QA improvements on HotpotQA over a strong baseline on the original, out-of-domain, and multi-hop dev sets.
arXiv Detail & Related papers (2020-02-22T19:40:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.