Towards Robust Extractive Question Answering Models: Rethinking the Training Methodology
- URL: http://arxiv.org/abs/2409.19766v1
- Date: Sun, 29 Sep 2024 20:35:57 GMT
- Title: Towards Robust Extractive Question Answering Models: Rethinking the Training Methodology
- Authors: Son Quoc Tran, Matt Kretchmar,
- Abstract summary: Previous research has shown that existing models, when trained on EQA datasets that include unanswerable questions, demonstrate a significant lack of robustness.
Our proposed training method includes a novel loss function for the EQA problem and challenges an implicit assumption present in numerous EQA datasets.
Our models exhibit significantly enhanced robustness against two types of adversarial attacks, with a performance decrease of only about a third compared to the default models.
- Score: 0.34530027457862006
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This paper proposes a novel training method to improve the robustness of Extractive Question Answering (EQA) models. Previous research has shown that existing models, when trained on EQA datasets that include unanswerable questions, demonstrate a significant lack of robustness against distribution shifts and adversarial attacks. Despite this, the inclusion of unanswerable questions in EQA training datasets is essential for ensuring real-world reliability. Our proposed training method includes a novel loss function for the EQA problem and challenges an implicit assumption present in numerous EQA datasets. Models trained with our method maintain in-domain performance while achieving a notable improvement on out-of-domain datasets. This results in an overall F1 score improvement of 5.7 across all testing sets. Furthermore, our models exhibit significantly enhanced robustness against two types of adversarial attacks, with a performance decrease of only about a third compared to the default models.
Related papers
- The Surprising Effectiveness of Test-Time Training for Abstract Reasoning [64.36534512742736]
We investigate the effectiveness of test-time training (TTT) as a mechanism for improving models' reasoning capabilities.
TTT significantly improves performance on ARC tasks, achieving up to 6x improvement in accuracy compared to base fine-tuned models.
Our findings suggest that explicit symbolic search is not the only path to improved abstract reasoning in neural language models.
arXiv Detail & Related papers (2024-11-11T18:59:45Z) - Robust Training for Conversational Question Answering Models with
Reinforced Reformulation Generation [26.752549844734034]
We show that ConvQA models with robust training via reformulations, significantly outperform those with standard training from gold QA pairs only.
We demonstrate the viability of training major model components on one benchmark and applying them zero-shot to another.
arXiv Detail & Related papers (2023-10-20T13:51:08Z) - Analysis of the Reasoning with Redundant Information Provided Ability of
Large Language Models [0.0]
Large Language Models (LLMs) have demonstrated impressive capabilities across a range of natural language processing tasks.
To address this gap, a new form of Question-Answering (QA) task, termed Reasoning with Redundant Information Provided (RRIP), is introduced.
This study evaluates two popular LLMs, LlaMA2-13B-chat and generative pre-trained transformer 3.5 (GPT-3.5), contrasting their performance on traditional QA tasks against RRIP tasks.
arXiv Detail & Related papers (2023-10-06T06:20:06Z) - Improving Visual Question Answering Models through Robustness Analysis
and In-Context Learning with a Chain of Basic Questions [70.70725223310401]
This work proposes a new method that utilizes semantically related questions, referred to as basic questions, acting as noise to evaluate the robustness of VQA models.
The experimental results demonstrate that the proposed evaluation method effectively analyzes the robustness of VQA models.
arXiv Detail & Related papers (2023-04-06T15:32:35Z) - TWINS: A Fine-Tuning Framework for Improved Transferability of
Adversarial Robustness and Generalization [89.54947228958494]
This paper focuses on the fine-tuning of an adversarially pre-trained model in various classification tasks.
We propose a novel statistics-based approach, Two-WIng NormliSation (TWINS) fine-tuning framework.
TWINS is shown to be effective on a wide range of image classification datasets in terms of both generalization and robustness.
arXiv Detail & Related papers (2023-03-20T14:12:55Z) - Tokenization Consistency Matters for Generative Models on Extractive NLP
Tasks [54.306234256074255]
We identify the issue of tokenization inconsistency that is commonly neglected in training generative models.
This issue damages the extractive nature of these tasks after the input and output are tokenized inconsistently.
We show that, with consistent tokenization, the model performs better in both in-domain and out-of-domain datasets.
arXiv Detail & Related papers (2022-12-19T23:33:21Z) - Retrieval-guided Counterfactual Generation for QA [5.434621727606356]
We focus on the task of creating counterfactuals for question answering.
We develop a Retrieve-Generate-Filter technique to create counterfactual evaluation and training data.
We find that RGF data leads to significant improvements in a model's robustness to local perturbations.
arXiv Detail & Related papers (2021-10-14T17:56:37Z) - Attention-guided Generative Models for Extractive Question Answering [17.476450946279037]
Recently, pretrained generative sequence-to-sequence (seq2seq) models have achieved great success in question answering.
We propose a simple strategy to obtain an extractive answer span from the generative model by leveraging the decoder cross-attention patterns.
arXiv Detail & Related papers (2021-10-12T23:02:35Z) - Learning to Perturb Word Embeddings for Out-of-distribution QA [55.103586220757464]
We propose a simple yet effective DA method based on a noise generator, which learns to perturb the word embedding of the input questions and context without changing their semantics.
We validate the performance of the QA models trained with our word embedding on a single source dataset, on five different target domains.
Notably, the model trained with ours outperforms the model trained with more than 240K artificially generated QA pairs.
arXiv Detail & Related papers (2021-05-06T14:12:26Z) - Contrast and Classify: Training Robust VQA Models [60.80627814762071]
We propose a novel training paradigm (ConClaT) that optimize both cross-entropy and contrastive losses.
We find that optimizing both losses -- either alternately or jointly -- is key to effective training.
arXiv Detail & Related papers (2020-10-13T00:23:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.