Robustness to Spurious Correlations in Text Classification via
Automatically Generated Counterfactuals
- URL: http://arxiv.org/abs/2012.10040v1
- Date: Fri, 18 Dec 2020 03:57:32 GMT
- Title: Robustness to Spurious Correlations in Text Classification via
Automatically Generated Counterfactuals
- Authors: Zhao Wang and Aron Culotta
- Abstract summary: We propose to train a robust text classifier by augmenting the training data with automatically generated counterfactual data.
We show that the robust classifier makes meaningful and trustworthy predictions by emphasizing causal features and de-emphasizing non-causal features.
- Score: 8.827892752465958
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spurious correlations threaten the validity of statistical classifiers. While
model accuracy may appear high when the test data is from the same distribution
as the training data, it can quickly degrade when the test distribution
changes. For example, it has been shown that classifiers perform poorly when
humans make minor modifications to change the label of an example. One solution
to increase model reliability and generalizability is to identify causal
associations between features and classes. In this paper, we propose to train a
robust text classifier by augmenting the training data with automatically
generated counterfactual data. We first identify likely causal features using a
statistical matching approach. Next, we generate counterfactual samples for the
original training data by substituting causal features with their antonyms and
then assigning opposite labels to the counterfactual samples. Finally, we
combine the original data and counterfactual data to train a robust classifier.
Experiments on two classification tasks show that a traditional classifier
trained on the original data does very poorly on human-generated counterfactual
samples (e.g., 10%-37% drop in accuracy). However, the classifier trained on
the combined data is more robust and performs well on both the original test
data and the counterfactual test data (e.g., 12%-25% increase in accuracy
compared with the traditional classifier). Detailed analysis shows that the
robust classifier makes meaningful and trustworthy predictions by emphasizing
causal features and de-emphasizing non-causal features.
Related papers
- Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - Self-Evolution Learning for Mixup: Enhance Data Augmentation on Few-Shot
Text Classification Tasks [75.42002070547267]
We propose a self evolution learning (SE) based mixup approach for data augmentation in text classification.
We introduce a novel instance specific label smoothing approach, which linearly interpolates the model's output and one hot labels of the original samples to generate new soft for label mixing up.
arXiv Detail & Related papers (2023-05-22T23:43:23Z) - Improving Classifier Robustness through Active Generation of Pairwise
Counterfactuals [22.916599410472102]
We present a novel framework that utilizes counterfactual generative models to generate a large number of diverse counterfactuals.
We show that with a small amount of human-annotated counterfactual data (10%), we can generate a counterfactual augmentation dataset with learned labels.
arXiv Detail & Related papers (2023-05-22T23:19:01Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Robust Neural Network Classification via Double Regularization [2.41710192205034]
We propose a novel double regularization of the neural network training loss that combines a penalty on the complexity of the classification model and an optimal reweighting of training observations.
We demonstrate DRFit, for neural net classification of (i) MNIST and (ii) CIFAR-10, in both cases with simulated mislabeling.
arXiv Detail & Related papers (2021-12-15T13:19:20Z) - Automatically detecting data drift in machine learning classifiers [2.202253618096515]
We term changes that affect machine learning performance data drift' or drift'
We propose an approach based solely on classifier suggested labels and its confidence in them, for alerting on data distribution or feature space changes that are likely to cause data drift.
arXiv Detail & Related papers (2021-11-10T12:34:14Z) - Double Perturbation: On the Robustness of Robustness and Counterfactual
Bias Evaluation [109.06060143938052]
We propose a "double perturbation" framework to uncover model weaknesses beyond the test dataset.
We apply this framework to study two perturbation-based approaches that are used to analyze models' robustness and counterfactual bias in English.
arXiv Detail & Related papers (2021-04-12T06:57:36Z) - Evaluating Fairness of Machine Learning Models Under Uncertain and
Incomplete Information [25.739240011015923]
We show that the test accuracy of the attribute classifier is not always correlated with its effectiveness in bias estimation for a downstream model.
Our analysis has surprising and counter-intuitive implications where in certain regimes one might want to distribute the error of the attribute classifier as unevenly as possible.
arXiv Detail & Related papers (2021-02-16T19:02:55Z) - Robust Fairness under Covariate Shift [11.151913007808927]
Making predictions that are fair with regard to protected group membership has become an important requirement for classification algorithms.
We propose an approach that obtains the predictor that is robust to the worst-case in terms of target performance.
arXiv Detail & Related papers (2020-10-11T04:42:01Z) - Certified Robustness to Label-Flipping Attacks via Randomized Smoothing [105.91827623768724]
Machine learning algorithms are susceptible to data poisoning attacks.
We present a unifying view of randomized smoothing over arbitrary functions.
We propose a new strategy for building classifiers that are pointwise-certifiably robust to general data poisoning attacks.
arXiv Detail & Related papers (2020-02-07T21:28:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.