Related papers: Effects of Human Adversarial and Affable Samples on BERT Generalization

Effects of Human Adversarial and Affable Samples on BERT Generalization

URL: http://arxiv.org/abs/2310.08008v4
Date: Sun, 10 Dec 2023 22:40:14 GMT
Title: Effects of Human Adversarial and Affable Samples on BERT Generalization
Authors: Aparna Elangovan, Jiayuan He, Yuan Li, Karin Verspoor
Abstract summary: We examine the impact of training data quality, not quantity, on a model's generalizability. We find that for a fixed size of training samples, as a rule of thumb, having 10-30% h-adversarial instances improves the precision.
Score: 12.000570944219515
License: http://creativecommons.org/licenses/by/4.0/
Abstract: BERT-based models have had strong performance on leaderboards, yet have been demonstrably worse in real-world settings requiring generalization. Limited quantities of training data is considered a key impediment to achieving generalizability in machine learning. In this paper, we examine the impact of training data quality, not quantity, on a model's generalizability. We consider two characteristics of training data: the portion of human-adversarial (h-adversarial), i.e., sample pairs with seemingly minor differences but different ground-truth labels, and human-affable (h-affable) training samples, i.e., sample pairs with minor differences but the same ground-truth label. We find that for a fixed size of training samples, as a rule of thumb, having 10-30% h-adversarial instances improves the precision, and therefore F1, by up to 20 points in the tasks of text classification and relation extraction. Increasing h-adversarials beyond this range can result in performance plateaus or even degradation. In contrast, h-affables may not contribute to a model's generalizability and may even degrade generalization performance.

Related papers

Human Uncertainty-Aware Data Selection and Automatic Labeling in Visual Question Answering [50.6117007117789]
HaDola operates in four stages -- discriminate, self-annotate, error trigger, and training -- to iteratively identify harmful samples, prioritize informative ones, and bootstrap from a small seed set.<n>Our approach substantially reduces reliance on costly HU annotations and makes VLMs more accurate and better calibrated.
arXiv Detail & Related papers (2025-10-13T11:35:30Z)
Distributionally Generative Augmentation for Fair Facial Attribute Classification [69.97710556164698]
Facial Attribute Classification (FAC) holds substantial promise in widespread applications. FAC models trained by traditional methodologies can be unfair by exhibiting accuracy inconsistencies across varied data subpopulations. This work proposes a novel, generation-based two-stage framework to train a fair FAC model on biased data without additional annotation.
arXiv Detail & Related papers (2024-03-11T10:50:53Z)
Ask Your Distribution Shift if Pre-Training is Right for You [74.18516460467019]
In practice, fine-tuning a pre-trained model improves robustness significantly in some cases but not at all in others. We focus on two possible failure modes of models under distribution shift: poor extrapolation and biases in the training data. Our study suggests that, as a rule of thumb, pre-training can help mitigate poor extrapolation but not dataset biases.
arXiv Detail & Related papers (2024-02-29T23:46:28Z)
Exploring Data Augmentations on Self-/Semi-/Fully- Supervised Pre-trained Models [24.376036129920948]
We investigate how data augmentation affects performance of vision pre-trained models. We apply 4 types of data augmentations termed with Random Erasing, CutOut, CutMix and MixUp. We report their performance on vision tasks such as image classification, object detection, instance segmentation, and semantic segmentation.
arXiv Detail & Related papers (2023-10-28T23:46:31Z)
On the Connection between Pre-training Data Diversity and Fine-tuning Robustness [66.30369048726145]
We find that the primary factor influencing downstream effective robustness is data quantity. We demonstrate our findings on pre-training distributions drawn from various natural and synthetic data sources.
arXiv Detail & Related papers (2023-07-24T05:36:19Z)
Deep Learning on a Healthy Data Diet: Finding Important Examples for Fairness [15.210232622716129]
Data-driven predictive solutions predominant in commercial applications tend to suffer from biases and stereotypes. Data augmentation reduces gender bias by adding counterfactual examples to the training dataset. We show that some of the examples in the augmented dataset can be not important or even harmful for fairness.
arXiv Detail & Related papers (2022-11-20T22:42:30Z)
Robustifying Sentiment Classification by Maximally Exploiting Few Counterfactuals [16.731183915325584]
We propose a novel solution that only requires annotation of a small fraction of the original training data. We achieve noticeable accuracy improvements by adding only 1% manual counterfactuals.
arXiv Detail & Related papers (2022-10-21T08:30:09Z)
DiscrimLoss: A Universal Loss for Hard Samples and Incorrect Samples Discrimination [28.599571524763785]
Given data with label noise (i.e., incorrect data), deep neural networks would gradually memorize the label noise and impair model performance. To relieve this issue, curriculum learning is proposed to improve model performance and generalization by ordering training samples in a meaningful sequence.
arXiv Detail & Related papers (2022-08-21T13:38:55Z)
Equivariance and Invariance Inductive Bias for Learning from Insufficient Data [65.42329520528223]
We show why insufficient data renders the model more easily biased to the limited training environments that are usually different from testing. We propose a class-wise invariant risk minimization (IRM) that efficiently tackles the challenge of missing environmental annotation in conventional IRM.
arXiv Detail & Related papers (2022-07-25T15:26:19Z)
Assessing Dataset Bias in Computer Vision [0.0]
biases have the tendency to propagate to the models that train on them, often leading to a poor performance in the minority class. We will apply several augmentation techniques on a sample of the UTKFace dataset, such as undersampling, geometric transformations, variational autoencoders (VAEs), and generative adversarial networks (GANs) We were able to show that our model has a better overall performance and consistency on age and ethnicity classification on multiple datasets when compared with the FairFace model.
arXiv Detail & Related papers (2022-05-03T22:45:49Z)
Agree to Disagree: Diversity through Disagreement for Better Transferability [54.308327969778155]
We propose D-BAT (Diversity-By-disAgreement Training), which enforces agreement among the models on the training data. We show how D-BAT naturally emerges from the notion of generalized discrepancy.
arXiv Detail & Related papers (2022-02-09T12:03:02Z)
FairIF: Boosting Fairness in Deep Learning via Influence Functions with Validation Set Sensitive Attributes [51.02407217197623]
We propose a two-stage training algorithm named FAIRIF. It minimizes the loss over the reweighted data set where the sample weights are computed. We show that FAIRIF yields models with better fairness-utility trade-offs against various types of bias.
arXiv Detail & Related papers (2022-01-15T05:14:48Z)
On the Impact of Hard Adversarial Instances on Overfitting in Adversarial Training [70.82725772926949]
Adversarial training is a popular method to robustify models against adversarial attacks. In this work, we investigate this phenomenon from the perspective of training instances. We show that the decay in generalization performance of adversarial training is a result of fitting hard adversarial instances.
arXiv Detail & Related papers (2021-12-14T12:19:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.