The Impacts of Unanswerable Questions on the Robustness of Machine
Reading Comprehension Models
- URL: http://arxiv.org/abs/2302.00094v1
- Date: Tue, 31 Jan 2023 20:51:14 GMT
- Title: The Impacts of Unanswerable Questions on the Robustness of Machine
Reading Comprehension Models
- Authors: Son Quoc Tran, Phong Nguyen-Thuan Do, Uyen Le, Matt Kretchmar
- Abstract summary: We fine-tune three state-of-the-art language models on either SQuAD 1.1 or SQuAD 2.0 and then evaluate their robustness under adversarial attacks.
Our experiments reveal that current models fine-tuned on SQuAD 2.0 do not initially appear to be any more robust than ones fine-tuned on SQuAD 1.1.
Furthermore, we find that the robustness of models fine-tuned on SQuAD 2.0 extends to additional out-of-domain datasets.
- Score: 0.20646127669654826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pretrained language models have achieved super-human performances on many
Machine Reading Comprehension (MRC) benchmarks. Nevertheless, their relative
inability to defend against adversarial attacks has spurred skepticism about
their natural language understanding. In this paper, we ask whether training
with unanswerable questions in SQuAD 2.0 can help improve the robustness of MRC
models against adversarial attacks. To explore that question, we fine-tune
three state-of-the-art language models on either SQuAD 1.1 or SQuAD 2.0 and
then evaluate their robustness under adversarial attacks. Our experiments
reveal that current models fine-tuned on SQuAD 2.0 do not initially appear to
be any more robust than ones fine-tuned on SQuAD 1.1, yet they reveal a measure
of hidden robustness that can be leveraged to realize actual performance gains.
Furthermore, we find that the robustness of models fine-tuned on SQuAD 2.0
extends to additional out-of-domain datasets. Finally, we introduce a new
adversarial attack to reveal artifacts of SQuAD 2.0 that current MRC models are
learning.
Related papers
- Towards Robust Extractive Question Answering Models: Rethinking the Training Methodology [0.34530027457862006]
Previous research has shown that existing models, when trained on EQA datasets that include unanswerable questions, demonstrate a significant lack of robustness.
Our proposed training method includes a novel loss function for the EQA problem and challenges an implicit assumption present in numerous EQA datasets.
Our models exhibit significantly enhanced robustness against two types of adversarial attacks, with a performance decrease of only about a third compared to the default models.
arXiv Detail & Related papers (2024-09-29T20:35:57Z) - Precisely the Point: Adversarial Augmentations for Faithful and
Informative Text Generation [45.37475848753975]
In this paper, we conduct the first quantitative analysis on the robustness of pre-trained Seq2Seq models.
We find that even current SOTA pre-trained Seq2Seq model (BART) is still vulnerable, which leads to significant degeneration in faithfulness and informativeness for text generation tasks.
We propose a novel adversarial augmentation framework, namely AdvSeq, for improving faithfulness and informativeness of Seq2Seq models.
arXiv Detail & Related papers (2022-10-22T06:38:28Z) - Characterizing the adversarial vulnerability of speech self-supervised
learning [95.03389072594243]
We make the first attempt to investigate the adversarial vulnerability of such paradigm under the attacks from both zero-knowledge adversaries and limited-knowledge adversaries.
The experimental results illustrate that the paradigm proposed by SUPERB is seriously vulnerable to limited-knowledge adversaries.
arXiv Detail & Related papers (2021-11-08T08:44:04Z) - Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of
Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.
All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z) - RobustART: Benchmarking Robustness on Architecture Design and Training
Techniques [170.3297213957074]
Deep neural networks (DNNs) are vulnerable to adversarial noises.
There are no comprehensive studies of how architecture design and training techniques affect robustness.
We propose the first comprehensiveness investigation benchmark on ImageNet.
arXiv Detail & Related papers (2021-09-11T08:01:14Z) - When to Fold'em: How to answer Unanswerable questions [5.586191108738563]
We present 3 different question-answering models trained on the SQuAD2.0 dataset.
We developed a novel approach capable of achieving a 2% point improvement in SQuAD2.0 F1 in reduced training time.
arXiv Detail & Related papers (2021-05-01T19:08:40Z) - RobustBench: a standardized adversarial robustness benchmark [84.50044645539305]
Key challenge in benchmarking robustness is that its evaluation is often error-prone leading to robustness overestimation.
We evaluate adversarial robustness with AutoAttack, an ensemble of white- and black-box attacks.
We analyze the impact of robustness on the performance on distribution shifts, calibration, out-of-distribution detection, fairness, privacy leakage, smoothness, and transferability.
arXiv Detail & Related papers (2020-10-19T17:06:18Z) - InfoBERT: Improving Robustness of Language Models from An Information
Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks.
We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z) - Benchmarking Robustness of Machine Reading Comprehension Models [29.659586787812106]
We construct AdvRACE, a new model-agnostic benchmark for evaluating the robustness of MRC models under four different types of adversarial attacks.
We show that state-of-the-art (SOTA) models are vulnerable to all of these attacks.
We conclude that there is substantial room for building more robust MRC models and our benchmark can help motivate and measure progress in this area.
arXiv Detail & Related papers (2020-04-29T08:05:32Z) - RAB: Provable Robustness Against Backdoor Attacks [20.702977915926787]
We focus on certifying the machine learning model robustness against general threat models, especially backdoor attacks.
We propose the first robust training process, RAB, to smooth the trained model and certify its robustness against backdoor attacks.
We conduct comprehensive experiments for different machine learning (ML) models and provide the first benchmark for certified robustness against backdoor attacks.
arXiv Detail & Related papers (2020-03-19T17:05:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.