Related papers: The Impacts of Unanswerable Questions on the Robustness of Machine Reading Comprehension Models

The Impacts of Unanswerable Questions on the Robustness of Machine Reading Comprehension Models

URL: http://arxiv.org/abs/2302.00094v1
Date: Tue, 31 Jan 2023 20:51:14 GMT
Title: The Impacts of Unanswerable Questions on the Robustness of Machine Reading Comprehension Models
Authors: Son Quoc Tran, Phong Nguyen-Thuan Do, Uyen Le, Matt Kretchmar
Abstract summary: We fine-tune three state-of-the-art language models on either SQuAD 1.1 or SQuAD 2.0 and then evaluate their robustness under adversarial attacks. Our experiments reveal that current models fine-tuned on SQuAD 2.0 do not initially appear to be any more robust than ones fine-tuned on SQuAD 1.1. Furthermore, we find that the robustness of models fine-tuned on SQuAD 2.0 extends to additional out-of-domain datasets.
Score: 0.20646127669654826
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pretrained language models have achieved super-human performances on many Machine Reading Comprehension (MRC) benchmarks. Nevertheless, their relative inability to defend against adversarial attacks has spurred skepticism about their natural language understanding. In this paper, we ask whether training with unanswerable questions in SQuAD 2.0 can help improve the robustness of MRC models against adversarial attacks. To explore that question, we fine-tune three state-of-the-art language models on either SQuAD 1.1 or SQuAD 2.0 and then evaluate their robustness under adversarial attacks. Our experiments reveal that current models fine-tuned on SQuAD 2.0 do not initially appear to be any more robust than ones fine-tuned on SQuAD 1.1, yet they reveal a measure of hidden robustness that can be leveraged to realize actual performance gains. Furthermore, we find that the robustness of models fine-tuned on SQuAD 2.0 extends to additional out-of-domain datasets. Finally, we introduce a new adversarial attack to reveal artifacts of SQuAD 2.0 that current MRC models are learning.

Related papers

Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models [26.656858396343726]
Multi-modal Large Language Models (MLLMs) excel in vision-language tasks but remain vulnerable to visual adversarial perturbations. Existing methods seek to mitigate these risks by applying constrained adversarial fine-tuning to CLIP vision encoders on ImageNet-scale data. We explore an alternative approach of leveraging existing vision classification models that have been adversarially pre-trained on large-scale data.
arXiv Detail & Related papers (2025-02-03T17:59:45Z)
Towards Robust Extractive Question Answering Models: Rethinking the Training Methodology [0.34530027457862006]
Previous research has shown that existing models, when trained on EQA datasets that include unanswerable questions, demonstrate a significant lack of robustness. Our proposed training method includes a novel loss function for the EQA problem and challenges an implicit assumption present in numerous EQA datasets. Our models exhibit significantly enhanced robustness against two types of adversarial attacks, with a performance decrease of only about a third compared to the default models.
arXiv Detail & Related papers (2024-09-29T20:35:57Z)
Precisely the Point: Adversarial Augmentations for Faithful and Informative Text Generation [45.37475848753975]
In this paper, we conduct the first quantitative analysis on the robustness of pre-trained Seq2Seq models. We find that even current SOTA pre-trained Seq2Seq model (BART) is still vulnerable, which leads to significant degeneration in faithfulness and informativeness for text generation tasks. We propose a novel adversarial augmentation framework, namely AdvSeq, for improving faithfulness and informativeness of Seq2Seq models.
arXiv Detail & Related papers (2022-10-22T06:38:28Z)
Characterizing the adversarial vulnerability of speech self-supervised learning [95.03389072594243]
We make the first attempt to investigate the adversarial vulnerability of such paradigm under the attacks from both zero-knowledge adversaries and limited-knowledge adversaries. The experimental results illustrate that the paradigm proposed by SUPERB is seriously vulnerable to limited-knowledge adversaries.
arXiv Detail & Related papers (2021-11-08T08:44:04Z)
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models [86.02610674750345]
Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks. We apply 14 adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations. All the language models and robust training methods we tested perform poorly on AdvGLUE, with scores lagging far behind the benign accuracy.
arXiv Detail & Related papers (2021-11-04T12:59:55Z)
RobustART: Benchmarking Robustness on Architecture Design and Training Techniques [170.3297213957074]
Deep neural networks (DNNs) are vulnerable to adversarial noises. There are no comprehensive studies of how architecture design and training techniques affect robustness. We propose the first comprehensiveness investigation benchmark on ImageNet.
arXiv Detail & Related papers (2021-09-11T08:01:14Z)
When to Fold'em: How to answer Unanswerable questions [5.586191108738563]
We present 3 different question-answering models trained on the SQuAD2.0 dataset. We developed a novel approach capable of achieving a 2% point improvement in SQuAD2.0 F1 in reduced training time.
arXiv Detail & Related papers (2021-05-01T19:08:40Z)
RobustBench: a standardized adversarial robustness benchmark [84.50044645539305]
Key challenge in benchmarking robustness is that its evaluation is often error-prone leading to robustness overestimation. We evaluate adversarial robustness with AutoAttack, an ensemble of white- and black-box attacks. We analyze the impact of robustness on the performance on distribution shifts, calibration, out-of-distribution detection, fairness, privacy leakage, smoothness, and transferability.
arXiv Detail & Related papers (2020-10-19T17:06:18Z)
InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective [84.78604733927887]
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks. Recent studies show that such BERT-based models are vulnerable facing the threats of textual adversarial attacks. We propose InfoBERT, a novel learning framework for robust fine-tuning of pre-trained language models.
arXiv Detail & Related papers (2020-10-05T20:49:26Z)
Benchmarking Robustness of Machine Reading Comprehension Models [29.659586787812106]
We construct AdvRACE, a new model-agnostic benchmark for evaluating the robustness of MRC models under four different types of adversarial attacks. We show that state-of-the-art (SOTA) models are vulnerable to all of these attacks. We conclude that there is substantial room for building more robust MRC models and our benchmark can help motivate and measure progress in this area.
arXiv Detail & Related papers (2020-04-29T08:05:32Z)
RAB: Provable Robustness Against Backdoor Attacks [20.702977915926787]
We focus on certifying the machine learning model robustness against general threat models, especially backdoor attacks. We propose the first robust training process, RAB, to smooth the trained model and certify its robustness against backdoor attacks. We conduct comprehensive experiments for different machine learning (ML) models and provide the first benchmark for certified robustness against backdoor attacks.
arXiv Detail & Related papers (2020-03-19T17:05:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.