Targeted Data Generation: Finding and Fixing Model Weaknesses
- URL: http://arxiv.org/abs/2305.17804v1
- Date: Sun, 28 May 2023 19:36:50 GMT
- Title: Targeted Data Generation: Finding and Fixing Model Weaknesses
- Authors: Zexue He, Marco Tulio Ribeiro, Fereshte Khani
- Abstract summary: Even when aggregate accuracy is high, state-of-the-art NLP models often fail systematically on specific subgroups of data.
We propose Targeted Data Generation (TDG), a framework that automatically identifies challenging subgroups.
In experiments, TDG significantly improves the accuracy on challenging subgroups for state-of-the-art sentiment analysis and natural language inference models.
- Score: 6.9649605149785465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Even when aggregate accuracy is high, state-of-the-art NLP models often fail
systematically on specific subgroups of data, resulting in unfair outcomes and
eroding user trust. Additional data collection may not help in addressing these
weaknesses, as such challenging subgroups may be unknown to users, and
underrepresented in the existing and new data. We propose Targeted Data
Generation (TDG), a framework that automatically identifies challenging
subgroups, and generates new data for those subgroups using large language
models (LLMs) with a human in the loop. TDG estimates the expected benefit and
potential harm of data augmentation for each subgroup, and selects the ones
most likely to improve within group performance without hurting overall
performance. In our experiments, TDG significantly improves the accuracy on
challenging subgroups for state-of-the-art sentiment analysis and natural
language inference models, while also improving overall test accuracy.
Related papers
- Data Debiasing with Datamodels (D3M): Improving Subgroup Robustness via Data Selection [80.85902083005237]
We introduce Data Debiasing with Datamodels (D3M), a debiasing approach which isolates and removes specific training examples that drive the model's failures on minority groups.
arXiv Detail & Related papers (2024-06-24T17:51:01Z) - Towards More Practical Group Activity Detection: A New Benchmark and Model [61.39427407758131]
Group activity detection (GAD) is the task of identifying members of each group and classifying the activity of the group at the same time in a video.
We present a new dataset, dubbed Caf'e, which presents more practical scenarios and metrics.
We also propose a new GAD model that deals with an unknown number of groups and latent group members efficiently and effectively.
arXiv Detail & Related papers (2023-12-05T16:48:17Z) - Ranking & Reweighting Improves Group Distributional Robustness [14.021069321266516]
We propose a ranking-based training method called Discounted Rank Upweighting (DRU) to learn models that exhibit strong OOD performance on the test data.
Results on several synthetic and real-world datasets highlight the superior ability of our group-ranking-based (akin to soft-minimax) approach in selecting and learning models that are robust to group distributional shifts.
arXiv Detail & Related papers (2023-05-09T20:37:16Z) - Pushing the Accuracy-Group Robustness Frontier with Introspective
Self-play [16.262574174989698]
Introspective Self-play (ISP) is a simple approach to improve the uncertainty estimation of a deep neural network under dataset bias.
We show that ISP provably improves the bias-awareness of the model representation and the resulting uncertainty estimates.
arXiv Detail & Related papers (2023-02-11T22:59:08Z) - AGRO: Adversarial Discovery of Error-prone groups for Robust
Optimization [109.91265884632239]
Group distributionally robust optimization (G-DRO) can minimize the worst-case loss over a set of pre-defined groups over training data.
We propose AGRO -- Adversarial Group discovery for Distributionally Robust Optimization.
AGRO results in 8% higher model performance on average on known worst-groups, compared to prior group discovery approaches.
arXiv Detail & Related papers (2022-12-02T00:57:03Z) - Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular
data [81.43750358586072]
We propose Data-IQ, a framework to systematically stratify examples into subgroups with respect to their outcomes.
We experimentally demonstrate the benefits of Data-IQ on four real-world medical datasets.
arXiv Detail & Related papers (2022-10-24T08:57:55Z) - Take One Gram of Neural Features, Get Enhanced Group Robustness [23.541213868620837]
Predictive performance of machine learning models trained with empirical risk minimization can degrade considerably under distribution shifts.
We propose to partition the training dataset into groups based on Gram matrices of features extracted by an identification'' model.
Our approach not only improves group robustness over ERM but also outperforms all recent baselines.
arXiv Detail & Related papers (2022-08-26T12:34:55Z) - Towards Group Robustness in the presence of Partial Group Labels [61.33713547766866]
spurious correlations between input samples and the target labels wrongly direct the neural network predictions.
We propose an algorithm that optimize for the worst-off group assignments from a constraint set.
We show improvements in the minority group's performance while preserving overall aggregate accuracy across groups.
arXiv Detail & Related papers (2022-01-10T22:04:48Z) - DoGR: Disaggregated Gaussian Regression for Reproducible Analysis of
Heterogeneous Data [4.720638420461489]
We introduce DoGR, a method that discovers latent confounders by simultaneously partitioning the data into overlapping clusters (disaggregation) and modeling the behavior within them (regression)
When applied to real-world data, our method discovers meaningful clusters and their characteristic behaviors, thus giving insight into group differences and their impact on the outcome of interest.
By accounting for latent confounders, our framework facilitates exploratory analysis of noisy, heterogeneous data and can be used to learn predictive models that better generalize to new data.
arXiv Detail & Related papers (2021-08-31T01:58:23Z) - Adversarial Filters of Dataset Biases [96.090959788952]
Large neural models have demonstrated human-level performance on language and vision benchmarks.
Their performance degrades considerably on adversarial or out-of-distribution samples.
We propose AFLite, which adversarially filters such dataset biases.
arXiv Detail & Related papers (2020-02-10T21:59:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.