Robust Training for Conversational Question Answering Models with
Reinforced Reformulation Generation
- URL: http://arxiv.org/abs/2310.13505v3
- Date: Fri, 16 Feb 2024 19:15:28 GMT
- Title: Robust Training for Conversational Question Answering Models with
Reinforced Reformulation Generation
- Authors: Magdalena Kaiser, Rishiraj Saha Roy, Gerhard Weikum
- Abstract summary: We show that ConvQA models with robust training via reformulations, significantly outperform those with standard training from gold QA pairs only.
We demonstrate the viability of training major model components on one benchmark and applying them zero-shot to another.
- Score: 26.752549844734034
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Models for conversational question answering (ConvQA) over knowledge graphs
(KGs) are usually trained and tested on benchmarks of gold QA pairs. This
implies that training is limited to surface forms seen in the respective
datasets, and evaluation is on a small set of held-out questions. Through our
proposed framework REIGN, we take several steps to remedy this restricted
learning setup. First, we systematically generate reformulations of training
questions to increase robustness of models to surface form variations. This is
a particularly challenging problem, given the incomplete nature of such
questions. Second, we guide ConvQA models towards higher performance by feeding
it only those reformulations that help improve their answering quality, using
deep reinforcement learning. Third, we demonstrate the viability of training
major model components on one benchmark and applying them zero-shot to another.
Finally, for a rigorous evaluation of robustness for trained models, we use and
release large numbers of diverse reformulations generated by prompting GPT for
benchmark test sets (resulting in 20x increase in sizes). Our findings show
that ConvQA models with robust training via reformulations, significantly
outperform those with standard training from gold QA pairs only.
Related papers
- Towards Robust Extractive Question Answering Models: Rethinking the Training Methodology [0.34530027457862006]
Previous research has shown that existing models, when trained on EQA datasets that include unanswerable questions, demonstrate a significant lack of robustness.
Our proposed training method includes a novel loss function for the EQA problem and challenges an implicit assumption present in numerous EQA datasets.
Our models exhibit significantly enhanced robustness against two types of adversarial attacks, with a performance decrease of only about a third compared to the default models.
arXiv Detail & Related papers (2024-09-29T20:35:57Z) - Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering [26.34649731975005]
Retriever-augmented instruction-following models are attractive alternatives to fine-tuned approaches for question answering (QA)
While the model responses tend to be natural and fluent, the additional verbosity makes traditional QA evaluation metrics unreliable for accurately quantifying model performance.
We use both automatic and human evaluation to evaluate these models along two dimensions: 1) how well they satisfy the user's information need (correctness) and 2) whether they produce a response based on the provided knowledge (faithfulness)
arXiv Detail & Related papers (2023-07-31T17:41:00Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - Improving Visual Question Answering Models through Robustness Analysis
and In-Context Learning with a Chain of Basic Questions [70.70725223310401]
This work proposes a new method that utilizes semantically related questions, referred to as basic questions, acting as noise to evaluate the robustness of VQA models.
The experimental results demonstrate that the proposed evaluation method effectively analyzes the robustness of VQA models.
arXiv Detail & Related papers (2023-04-06T15:32:35Z) - Counterfactual Samples Synthesizing and Training for Robust Visual
Question Answering [59.20766562530209]
VQA models still tend to capture superficial linguistic correlations in the training set.
Recent VQA works introduce an auxiliary question-only model to regularize the training of targeted VQA models.
We propose a novel model-agnostic Counterfactual Samples Synthesizing and Training (CSST) strategy.
arXiv Detail & Related papers (2021-10-03T14:31:46Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z) - Counterfactual Samples Synthesizing for Robust Visual Question Answering [104.72828511083519]
We propose a model-agnostic Counterfactual Samples Synthesizing (CSS) training scheme.
CSS generates numerous counterfactual training samples by masking critical objects in images or words in questions.
We achieve a record-breaking performance of 58.95% on VQA-CP v2, with 6.5% gains.
arXiv Detail & Related papers (2020-03-14T08:34:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.