Using Interactive Feedback to Improve the Accuracy and Explainability of
Question Answering Systems Post-Deployment
- URL: http://arxiv.org/abs/2204.03025v1
- Date: Wed, 6 Apr 2022 18:17:09 GMT
- Title: Using Interactive Feedback to Improve the Accuracy and Explainability of
Question Answering Systems Post-Deployment
- Authors: Zichao Li, Prakhar Sharma, Xing Han Lu, Jackie C.K. Cheung, Siva Reddy
- Abstract summary: We focus on two kinds of improvements: 1) improving the QA system's performance itself, and 2) providing the model with the ability to explain the correctness or incorrectness of an answer.
We collect a retrieval-based QA dataset, FeedbackQA, which contains interactive feedback from users.
We show that feedback data not only improves the accuracy of the deployed QA system but also other stronger non-deployed systems.
- Score: 20.601284299825895
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most research on question answering focuses on the pre-deployment stage;
i.e., building an accurate model for deployment. In this paper, we ask the
question: Can we improve QA systems further \emph{post-}deployment based on
user interactions? We focus on two kinds of improvements: 1) improving the QA
system's performance itself, and 2) providing the model with the ability to
explain the correctness or incorrectness of an answer. We collect a
retrieval-based QA dataset, FeedbackQA, which contains interactive feedback
from users. We collect this dataset by deploying a base QA system to
crowdworkers who then engage with the system and provide feedback on the
quality of its answers. The feedback contains both structured ratings and
unstructured natural language explanations. We train a neural model with this
feedback data that can generate explanations and re-score answer candidates. We
show that feedback data not only improves the accuracy of the deployed QA
system but also other stronger non-deployed systems. The generated explanations
also help users make informed decisions about the correctness of answers.
Project page: https://mcgill-nlp.github.io/feedbackqa/
Related papers
- SQUARE: Automatic Question Answering Evaluation using Multiple Positive
and Negative References [73.67707138779245]
We propose a new evaluation metric: SQuArE (Sentence-level QUestion AnsweRing Evaluation)
We evaluate SQuArE on both sentence-level extractive (Answer Selection) and generative (GenQA) QA systems.
arXiv Detail & Related papers (2023-09-21T16:51:30Z) - Reinforced Question Rewriting for Conversational Question Answering [25.555372505026526]
We develop a model to rewrite conversational questions into self-contained ones.
It allows using existing single-turn QA systems to avoid training a CQA model from scratch.
We propose using QA feedback to supervise the rewriting model with reinforcement learning.
arXiv Detail & Related papers (2022-10-27T21:23:36Z) - Knowledge Transfer from Answer Ranking to Answer Generation [97.38378660163414]
We propose to train a GenQA model by transferring knowledge from a trained AS2 model.
We also propose to use the AS2 model prediction scores for loss weighting and score-conditioned input/output shaping.
arXiv Detail & Related papers (2022-10-23T21:51:27Z) - Towards Teachable Reasoning Systems [29.59387051046722]
We develop a teachable reasoning system for question-answering (QA)
Our approach is three-fold: First, generated chains of reasoning show how answers are implied by the system's own internal beliefs.
Second, users can interact with the explanations to identify erroneous model beliefs and provide corrections.
Third, we augment the model with a dynamic memory of such corrections.
arXiv Detail & Related papers (2022-04-27T17:15:07Z) - Improving the Question Answering Quality using Answer Candidate
Filtering based on Natural-Language Features [117.44028458220427]
We address the problem of how the Question Answering (QA) quality of a given system can be improved.
Our main contribution is an approach capable of identifying wrong answers provided by a QA system.
In particular, our approach has shown its potential while removing in many cases the majority of incorrect answers.
arXiv Detail & Related papers (2021-12-10T11:09:44Z) - Improving Unsupervised Question Answering via Summarization-Informed
Question Generation [47.96911338198302]
Question Generation (QG) is the task of generating a plausible question for a passage, answer> pair.
We make use of freely available news summary data, transforming declarative sentences into appropriate questions using dependency parsing, named entity recognition and semantic role labeling.
The resulting questions are then combined with the original news articles to train an end-to-end neural QG model.
arXiv Detail & Related papers (2021-09-16T13:08:43Z) - Will this Question be Answered? Question Filtering via Answer Model
Distillation for Efficient Question Answering [99.66470885217623]
We propose a novel approach towards improving the efficiency of Question Answering (QA) systems by filtering out questions that will not be answered by them.
This is based on an interesting new finding: the answer confidence scores of state-of-the-art QA systems can be approximated well by models solely using the input question text.
arXiv Detail & Related papers (2021-09-14T23:07:49Z) - Improving Conversational Question Answering Systems after Deployment
using Feedback-Weighted Learning [69.42679922160684]
We propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback.
Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
arXiv Detail & Related papers (2020-11-01T19:50:34Z) - Summary-Oriented Question Generation for Informational Queries [23.72999724312676]
We aim to produce self-explanatory questions that focus on main document topics and are answerable with variable length passages as appropriate.
Our model shows SOTA performance of SQ generation on the NQ dataset (20.1 BLEU-4).
We further apply our model on out-of-domain news articles, evaluating with a QA system due to the lack of gold questions and demonstrate that our model produces better SQs for news articles -- with further confirmation via a human evaluation.
arXiv Detail & Related papers (2020-10-19T17:30:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.