Undersensitivity in Neural Reading Comprehension
- URL: http://arxiv.org/abs/2003.04808v1
- Date: Sat, 15 Feb 2020 19:03:36 GMT
- Title: Undersensitivity in Neural Reading Comprehension
- Authors: Johannes Welbl, Pasquale Minervini, Max Bartolo, Pontus Stenetorp,
Sebastian Riedel
- Abstract summary: Current reading comprehension models generalise well to in-distribution test sets, yet perform poorly on adversarially selected inputs.
We focus on the complementary problem of excessive prediction undersensitivity, where input text is meaningfully changed but the model's prediction does not.
We formulate a noisy adversarial attack which searches among semantic variations of the question for which a model erroneously predicts the same answer.
- Score: 36.142792758501706
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Current reading comprehension models generalise well to in-distribution test
sets, yet perform poorly on adversarially selected inputs. Most prior work on
adversarial inputs studies oversensitivity: semantically invariant text
perturbations that cause a model's prediction to change when it should not. In
this work we focus on the complementary problem: excessive prediction
undersensitivity, where input text is meaningfully changed but the model's
prediction does not, even though it should. We formulate a noisy adversarial
attack which searches among semantic variations of the question for which a
model erroneously predicts the same answer, and with even higher probability.
Despite comprising unanswerable questions, both SQuAD2.0 and NewsQA models are
vulnerable to this attack. This indicates that although accurate, models tend
to rely on spurious patterns and do not fully consider the information
specified in a question. We experiment with data augmentation and adversarial
training as defences, and find that both substantially decrease vulnerability
to attacks on held out data, as well as held out attack spaces. Addressing
undersensitivity also improves results on AddSent and AddOneSent, and models
furthermore generalise better when facing train/evaluation distribution
mismatch: they are less prone to overly rely on predictive cues present only in
the training set, and outperform a conventional model by as much as 10.9% F1.
Related papers
- PASA: Attack Agnostic Unsupervised Adversarial Detection using Prediction & Attribution Sensitivity Analysis [2.5347892611213614]
Deep neural networks for classification are vulnerable to adversarial attacks, where small perturbations to input samples lead to incorrect predictions.
We develop a practical method for this characteristic of model prediction and feature attribution to detect adversarial samples.
Our approach demonstrates competitive performance even when an adversary is aware of the defense mechanism.
arXiv Detail & Related papers (2024-04-12T21:22:21Z) - Adversarial Attacks Against Uncertainty Quantification [10.655660123083607]
This work focuses on a different adversarial scenario in which the attacker is still interested in manipulating the uncertainty estimate.
In particular, the goal is to undermine the use of machine-learning models when their outputs are consumed by a downstream module or by a human operator.
arXiv Detail & Related papers (2023-09-19T12:54:09Z) - How adversarial attacks can disrupt seemingly stable accurate classifiers [76.95145661711514]
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data.
Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data.
We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability.
arXiv Detail & Related papers (2023-09-07T12:02:00Z) - Black-box Adversarial Attacks on Network-wide Multi-step Traffic State
Prediction Models [4.353029347463806]
We propose an adversarial attack framework by treating the prediction model as a black-box.
The adversary can oracle the prediction model with any input and obtain corresponding output.
To test the attack effectiveness, two state of the art, graph neural network-based models (GCGRNN and DCRNN) are examined.
arXiv Detail & Related papers (2021-10-17T03:45:35Z) - Tribrid: Stance Classification with Neural Inconsistency Detection [9.150728831518459]
We study the problem of performing automatic stance classification on social media with neural architectures such as BERT.
We present a new neural architecture where the input also includes automatically generated negated perspectives over a given claim.
The model is jointly learned to make simultaneously multiple predictions, which can be used either to improve the classification of the original perspective or to filter out doubtful predictions.
arXiv Detail & Related papers (2021-09-14T08:13:03Z) - Double Perturbation: On the Robustness of Robustness and Counterfactual
Bias Evaluation [109.06060143938052]
We propose a "double perturbation" framework to uncover model weaknesses beyond the test dataset.
We apply this framework to study two perturbation-based approaches that are used to analyze models' robustness and counterfactual bias in English.
arXiv Detail & Related papers (2021-04-12T06:57:36Z) - Robustness May Be at Odds with Fairness: An Empirical Study on
Class-wise Accuracy [85.20742045853738]
CNNs are widely known to be vulnerable to adversarial attacks.
We propose an empirical study on the class-wise accuracy and robustness of adversarially trained models.
We find that there exists inter-class discrepancy for accuracy and robustness even when the training dataset has an equal number of samples for each class.
arXiv Detail & Related papers (2020-10-26T06:32:32Z) - Improving Robustness by Augmenting Training Sentences with
Predicate-Argument Structures [62.562760228942054]
Existing approaches to improve robustness against dataset biases mostly focus on changing the training objective.
We propose to augment the input sentences in the training data with their corresponding predicate-argument structures.
We show that without targeting a specific bias, our sentence augmentation improves the robustness of transformer models against multiple biases.
arXiv Detail & Related papers (2020-10-23T16:22:05Z) - Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial
Perturbations [65.05561023880351]
Adversarial examples are malicious inputs crafted to induce misclassification.
This paper studies a complementary failure mode, invariance-based adversarial examples.
We show that defenses against sensitivity-based attacks actively harm a model's accuracy on invariance-based attacks.
arXiv Detail & Related papers (2020-02-11T18:50:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.