Effects of Human Adversarial and Affable Samples on BERT Generalization
- URL: http://arxiv.org/abs/2310.08008v4
- Date: Sun, 10 Dec 2023 22:40:14 GMT
- Title: Effects of Human Adversarial and Affable Samples on BERT Generalization
- Authors: Aparna Elangovan, Jiayuan He, Yuan Li, Karin Verspoor
- Abstract summary: We examine the impact of training data quality, not quantity, on a model's generalizability.
We find that for a fixed size of training samples, as a rule of thumb, having 10-30% h-adversarial instances improves the precision.
- Score: 12.000570944219515
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: BERT-based models have had strong performance on leaderboards, yet have been
demonstrably worse in real-world settings requiring generalization. Limited
quantities of training data is considered a key impediment to achieving
generalizability in machine learning. In this paper, we examine the impact of
training data quality, not quantity, on a model's generalizability. We consider
two characteristics of training data: the portion of human-adversarial
(h-adversarial), i.e., sample pairs with seemingly minor differences but
different ground-truth labels, and human-affable (h-affable) training samples,
i.e., sample pairs with minor differences but the same ground-truth label. We
find that for a fixed size of training samples, as a rule of thumb, having
10-30% h-adversarial instances improves the precision, and therefore F1, by up
to 20 points in the tasks of text classification and relation extraction.
Increasing h-adversarials beyond this range can result in performance plateaus
or even degradation. In contrast, h-affables may not contribute to a model's
generalizability and may even degrade generalization performance.
Related papers
- Human Uncertainty-Aware Data Selection and Automatic Labeling in Visual Question Answering [50.6117007117789]
HaDola operates in four stages -- discriminate, self-annotate, error trigger, and training -- to iteratively identify harmful samples, prioritize informative ones, and bootstrap from a small seed set.<n>Our approach substantially reduces reliance on costly HU annotations and makes VLMs more accurate and better calibrated.
arXiv Detail & Related papers (2025-10-13T11:35:30Z) - Distributionally Generative Augmentation for Fair Facial Attribute Classification [69.97710556164698]
Facial Attribute Classification (FAC) holds substantial promise in widespread applications.
FAC models trained by traditional methodologies can be unfair by exhibiting accuracy inconsistencies across varied data subpopulations.
This work proposes a novel, generation-based two-stage framework to train a fair FAC model on biased data without additional annotation.
arXiv Detail & Related papers (2024-03-11T10:50:53Z) - Ask Your Distribution Shift if Pre-Training is Right for You [74.18516460467019]
In practice, fine-tuning a pre-trained model improves robustness significantly in some cases but not at all in others.
We focus on two possible failure modes of models under distribution shift: poor extrapolation and biases in the training data.
Our study suggests that, as a rule of thumb, pre-training can help mitigate poor extrapolation but not dataset biases.
arXiv Detail & Related papers (2024-02-29T23:46:28Z) - Exploring Data Augmentations on Self-/Semi-/Fully- Supervised
Pre-trained Models [24.376036129920948]
We investigate how data augmentation affects performance of vision pre-trained models.
We apply 4 types of data augmentations termed with Random Erasing, CutOut, CutMix and MixUp.
We report their performance on vision tasks such as image classification, object detection, instance segmentation, and semantic segmentation.
arXiv Detail & Related papers (2023-10-28T23:46:31Z) - On the Connection between Pre-training Data Diversity and Fine-tuning
Robustness [66.30369048726145]
We find that the primary factor influencing downstream effective robustness is data quantity.
We demonstrate our findings on pre-training distributions drawn from various natural and synthetic data sources.
arXiv Detail & Related papers (2023-07-24T05:36:19Z) - Deep Learning on a Healthy Data Diet: Finding Important Examples for
Fairness [15.210232622716129]
Data-driven predictive solutions predominant in commercial applications tend to suffer from biases and stereotypes.
Data augmentation reduces gender bias by adding counterfactual examples to the training dataset.
We show that some of the examples in the augmented dataset can be not important or even harmful for fairness.
arXiv Detail & Related papers (2022-11-20T22:42:30Z) - Robustifying Sentiment Classification by Maximally Exploiting Few
Counterfactuals [16.731183915325584]
We propose a novel solution that only requires annotation of a small fraction of the original training data.
We achieve noticeable accuracy improvements by adding only 1% manual counterfactuals.
arXiv Detail & Related papers (2022-10-21T08:30:09Z) - DiscrimLoss: A Universal Loss for Hard Samples and Incorrect Samples
Discrimination [28.599571524763785]
Given data with label noise (i.e., incorrect data), deep neural networks would gradually memorize the label noise and impair model performance.
To relieve this issue, curriculum learning is proposed to improve model performance and generalization by ordering training samples in a meaningful sequence.
arXiv Detail & Related papers (2022-08-21T13:38:55Z) - Equivariance and Invariance Inductive Bias for Learning from
Insufficient Data [65.42329520528223]
We show why insufficient data renders the model more easily biased to the limited training environments that are usually different from testing.
We propose a class-wise invariant risk minimization (IRM) that efficiently tackles the challenge of missing environmental annotation in conventional IRM.
arXiv Detail & Related papers (2022-07-25T15:26:19Z) - Assessing Dataset Bias in Computer Vision [0.0]
biases have the tendency to propagate to the models that train on them, often leading to a poor performance in the minority class.
We will apply several augmentation techniques on a sample of the UTKFace dataset, such as undersampling, geometric transformations, variational autoencoders (VAEs), and generative adversarial networks (GANs)
We were able to show that our model has a better overall performance and consistency on age and ethnicity classification on multiple datasets when compared with the FairFace model.
arXiv Detail & Related papers (2022-05-03T22:45:49Z) - Agree to Disagree: Diversity through Disagreement for Better
Transferability [54.308327969778155]
We propose D-BAT (Diversity-By-disAgreement Training), which enforces agreement among the models on the training data.
We show how D-BAT naturally emerges from the notion of generalized discrepancy.
arXiv Detail & Related papers (2022-02-09T12:03:02Z) - FairIF: Boosting Fairness in Deep Learning via Influence Functions with
Validation Set Sensitive Attributes [51.02407217197623]
We propose a two-stage training algorithm named FAIRIF.
It minimizes the loss over the reweighted data set where the sample weights are computed.
We show that FAIRIF yields models with better fairness-utility trade-offs against various types of bias.
arXiv Detail & Related papers (2022-01-15T05:14:48Z) - On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training [70.82725772926949]
Adversarial training is a popular method to robustify models against adversarial attacks.
In this work, we investigate this phenomenon from the perspective of training instances.
We show that the decay in generalization performance of adversarial training is a result of fitting hard adversarial instances.
arXiv Detail & Related papers (2021-12-14T12:19:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.