Towards Reliable and Factual Response Generation: Detecting Unanswerable
Questions in Information-Seeking Conversations
- URL: http://arxiv.org/abs/2401.11452v1
- Date: Sun, 21 Jan 2024 10:15:36 GMT
- Title: Towards Reliable and Factual Response Generation: Detecting Unanswerable
Questions in Information-Seeking Conversations
- Authors: Weronika {\L}ajewska, Krisztian Balog
- Abstract summary: Generative AI models face the challenge of hallucinations that can undermine users' trust in such systems.
We approach the problem of conversational information seeking as a two-step process, where relevant passages in a corpus are identified first and then summarized into a final system response.
Specifically, our proposed method employs a sentence-level classifier to detect if the answer is present, then aggregates these predictions on the passage level, and eventually across the top-ranked passages to arrive at a final answerability estimate.
- Score: 16.99952884041096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative AI models face the challenge of hallucinations that can undermine
users' trust in such systems. We approach the problem of conversational
information seeking as a two-step process, where relevant passages in a corpus
are identified first and then summarized into a final system response. This way
we can automatically assess if the answer to the user's question is present in
the corpus. Specifically, our proposed method employs a sentence-level
classifier to detect if the answer is present, then aggregates these
predictions on the passage level, and eventually across the top-ranked passages
to arrive at a final answerability estimate. For training and evaluation, we
develop a dataset based on the TREC CAsT benchmark that includes answerability
labels on the sentence, passage, and ranking levels. We demonstrate that our
proposed method represents a strong baseline and outperforms a state-of-the-art
LLM on the answerability prediction task.
Related papers
- Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books.
Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z) - PAQA: Toward ProActive Open-Retrieval Question Answering [34.883834970415734]
This work aims to tackle the challenge of generating relevant clarifying questions by taking into account the inherent ambiguities present in both user queries and documents.
We propose PAQA, an extension to the existing AmbiNQ dataset, incorporating clarifying questions.
We then evaluate various models and assess how passage retrieval impacts ambiguity detection and the generation of clarifying questions.
arXiv Detail & Related papers (2024-02-26T14:40:34Z) - PICK: Polished & Informed Candidate Scoring for Knowledge-Grounded
Dialogue Systems [59.1250765143521]
Current knowledge-grounded dialogue systems often fail to align the generated responses with human-preferred qualities.
We propose Polished & Informed Candidate Scoring (PICK), a generation re-scoring framework.
We demonstrate the effectiveness of PICK in generating responses that are more faithful while keeping them relevant to the dialogue history.
arXiv Detail & Related papers (2023-09-19T08:27:09Z) - Reranking Overgenerated Responses for End-to-End Task-Oriented Dialogue
Systems [71.33737787564966]
End-to-end (E2E) task-oriented dialogue (ToD) systems are prone to fall into the so-called 'likelihood trap'
We propose a reranking method which aims to select high-quality items from the lists of responses initially overgenerated by the system.
Our methods improve a state-of-the-art E2E ToD system by 2.4 BLEU, 3.2 ROUGE, and 2.8 METEOR scores, achieving new peak results.
arXiv Detail & Related papers (2022-11-07T15:59:49Z) - On the Impact of Speech Recognition Errors in Passage Retrieval for
Spoken Question Answering [13.013751306590303]
We study the robustness of lexical and dense retrievers against questions with synthetic ASR noise.
We create a new dataset with questions voiced by human users and use their transcriptions to show that the retrieval performance can further degrade when dealing with natural ASR noise instead of synthetic ASR noise.
arXiv Detail & Related papers (2022-09-26T18:29:36Z) - Double Retrieval and Ranking for Accurate Question Answering [120.69820139008138]
We show that an answer verification step introduced in Transformer-based answer selection models can significantly improve the state of the art in Question Answering.
The results on three well-known datasets for AS2 show consistent and significant improvement of the state of the art.
arXiv Detail & Related papers (2022-01-16T06:20:07Z) - A Clarifying Question Selection System from NTES_ALONG in Convai3
Challenge [8.656503175492375]
This paper presents the participation of NetEase Game AI Lab team for the ClariQ challenge at Search-oriented Conversational AI (SCAI) EMNLP workshop in 2020.
The challenge asks for a complete conversational information retrieval system that can understanding and generating clarification questions.
We propose a clarifying question selection system which consists of response understanding, candidate question recalling and clarifying question ranking.
arXiv Detail & Related papers (2020-10-27T11:22:53Z) - Visual Question Answering with Prior Class Semantics [50.845003775809836]
We show how to exploit additional information pertaining to the semantics of candidate answers.
We extend the answer prediction process with a regression objective in a semantic space.
Our method brings improvements in consistency and accuracy over a range of question types.
arXiv Detail & Related papers (2020-05-04T02:46:31Z) - A Revised Generative Evaluation of Visual Dialogue [80.17353102854405]
We propose a revised evaluation scheme for the VisDial dataset.
We measure consensus between answers generated by the model and a set of relevant answers.
We release these sets and code for the revised evaluation scheme as DenseVisDial.
arXiv Detail & Related papers (2020-04-20T13:26:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.