Double Perturbation: On the Robustness of Robustness and Counterfactual
Bias Evaluation
- URL: http://arxiv.org/abs/2104.05232v1
- Date: Mon, 12 Apr 2021 06:57:36 GMT
- Title: Double Perturbation: On the Robustness of Robustness and Counterfactual
Bias Evaluation
- Authors: Chong Zhang, Jieyu Zhao, Huan Zhang, Kai-Wei Chang, Cho-Jui Hsieh
- Abstract summary: We propose a "double perturbation" framework to uncover model weaknesses beyond the test dataset.
We apply this framework to study two perturbation-based approaches that are used to analyze models' robustness and counterfactual bias in English.
- Score: 109.06060143938052
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robustness and counterfactual bias are usually evaluated on a test dataset.
However, are these evaluations robust? If the test dataset is perturbed
slightly, will the evaluation results keep the same? In this paper, we propose
a "double perturbation" framework to uncover model weaknesses beyond the test
dataset. The framework first perturbs the test dataset to construct abundant
natural sentences similar to the test data, and then diagnoses the prediction
change regarding a single-word substitution. We apply this framework to study
two perturbation-based approaches that are used to analyze models' robustness
and counterfactual bias in English. (1) For robustness, we focus on synonym
substitutions and identify vulnerable examples where prediction can be altered.
Our proposed attack attains high success rates (96.0%-99.8%) in finding
vulnerable examples on both original and robustly trained CNNs and
Transformers. (2) For counterfactual bias, we focus on substituting demographic
tokens (e.g., gender, race) and measure the shift of the expected prediction
among constructed sentences. Our method is able to reveal the hidden model
biases not directly shown in the test dataset. Our code is available at
https://github.com/chong-z/nlp-second-order-attack.
Related papers
- Projective Methods for Mitigating Gender Bias in Pre-trained Language Models [10.418595661963062]
Projective methods are fast to implement, use a small number of saved parameters, and make no updates to the existing model parameters.
We find that projective methods can be effective at both intrinsic bias and downstream bias mitigation, but that the two outcomes are not necessarily correlated.
arXiv Detail & Related papers (2024-03-27T17:49:31Z) - Debiasing Stance Detection Models with Counterfactual Reasoning and
Adversarial Bias Learning [15.68462203989933]
Stance detection models tend to rely on dataset bias in the text part as a shortcut.
We propose an adversarial bias learning module to model the bias more accurately.
arXiv Detail & Related papers (2022-12-20T16:20:56Z) - Certifying Data-Bias Robustness in Linear Regression [12.00314910031517]
We present a technique for certifying whether linear regression models are pointwise-robust to label bias in a training dataset.
We show how to solve this problem exactly for individual test points, and provide an approximate but more scalable method.
We also unearth gaps in bias-robustness, such as high levels of non-robustness for certain bias assumptions on some datasets.
arXiv Detail & Related papers (2022-06-07T20:47:07Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Robust Fairness-aware Learning Under Sample Selection Bias [17.09665420515772]
We propose a framework for robust and fair learning under sample selection bias.
We develop two algorithms to handle sample selection bias when test data is both available and unavailable.
arXiv Detail & Related papers (2021-05-24T23:23:36Z) - Robustness to Spurious Correlations in Text Classification via
Automatically Generated Counterfactuals [8.827892752465958]
We propose to train a robust text classifier by augmenting the training data with automatically generated counterfactual data.
We show that the robust classifier makes meaningful and trustworthy predictions by emphasizing causal features and de-emphasizing non-causal features.
arXiv Detail & Related papers (2020-12-18T03:57:32Z) - The Gap on GAP: Tackling the Problem of Differing Data Distributions in
Bias-Measuring Datasets [58.53269361115974]
Diagnostic datasets that can detect biased models are an important prerequisite for bias reduction within natural language processing.
undesired patterns in the collected data can make such tests incorrect.
We introduce a theoretically grounded method for weighting test samples to cope with such patterns in the test data.
arXiv Detail & Related papers (2020-11-03T16:50:13Z) - Improving Robustness by Augmenting Training Sentences with
Predicate-Argument Structures [62.562760228942054]
Existing approaches to improve robustness against dataset biases mostly focus on changing the training objective.
We propose to augment the input sentences in the training data with their corresponding predicate-argument structures.
We show that without targeting a specific bias, our sentence augmentation improves the robustness of transformer models against multiple biases.
arXiv Detail & Related papers (2020-10-23T16:22:05Z) - Stable Prediction via Leveraging Seed Variable [73.9770220107874]
Previous machine learning methods might exploit subtly spurious correlations in training data induced by non-causal variables for prediction.
We propose a conditional independence test based algorithm to separate causal variables with a seed variable as priori, and adopt them for stable prediction.
Our algorithm outperforms state-of-the-art methods for stable prediction.
arXiv Detail & Related papers (2020-06-09T06:56:31Z) - Towards Robustifying NLI Models Against Lexical Dataset Biases [94.79704960296108]
This paper explores both data-level and model-level debiasing methods to robustify models against lexical dataset biases.
First, we debias the dataset through data augmentation and enhancement, but show that the model bias cannot be fully removed via this method.
The second approach employs a bag-of-words sub-model to capture the features that are likely to exploit the bias and prevents the original model from learning these biased features.
arXiv Detail & Related papers (2020-05-10T17:56:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.