Measuring and Reducing Gendered Correlations in Pre-trained Models
- URL: http://arxiv.org/abs/2010.06032v2
- Date: Tue, 2 Mar 2021 21:04:26 GMT
- Title: Measuring and Reducing Gendered Correlations in Pre-trained Models
- Authors: Kellie Webster and Xuezhi Wang and Ian Tenney and Alex Beutel and
Emily Pitler and Ellie Pavlick and Jilin Chen and Ed Chi and Slav Petrov
- Abstract summary: We show how pre-trained models can encode artifacts undesired in many applications, such as professions correlating with one gender more than another.
We show how measured correlations can be reduced with general-purpose techniques, and highlight the trade offs different strategies have.
- Score: 24.35758086428503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained models have revolutionized natural language understanding.
However, researchers have found they can encode artifacts undesired in many
applications, such as professions correlating with one gender more than
another. We explore such gendered correlations as a case study for how to
address unintended correlations in pre-trained models. We define metrics and
reveal that it is possible for models with similar accuracy to encode
correlations at very different rates. We show how measured correlations can be
reduced with general-purpose techniques, and highlight the trade offs different
strategies have. With these results, we make recommendations for training
robust models: (1) carefully evaluate unintended correlations, (2) be mindful
of seemingly innocuous configuration differences, and (3) focus on general
mitigations.
Related papers
- The Multiple Dimensions of Spuriousness in Machine Learning [3.475875199871536]
Learning correlations from data forms the foundation of today's machine learning (ML) and artificial intelligence (AI) research.
While such an approach enables the automatic discovery of patterned relationships within big data corpora, it is susceptible to failure modes when unintended correlations are captured.
This vulnerability has expanded interest in interrogating spuriousness, often critiqued as an impediment to model performance, fairness, and robustness.
arXiv Detail & Related papers (2024-11-07T13:29:32Z) - Learning Robust Classifiers with Self-Guided Spurious Correlation Mitigation [26.544938760265136]
Deep neural classifiers rely on spurious correlations between spurious attributes of inputs and targets to make predictions.
We propose a self-guided spurious correlation mitigation framework.
We show that training the classifier to distinguish different prediction behaviors reduces its reliance on spurious correlations without knowing them a priori.
arXiv Detail & Related papers (2024-05-06T17:12:21Z) - Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals [91.59906995214209]
We propose a new evaluation method, Counterfactual Attentiveness Test (CAT)
CAT uses counterfactuals by replacing part of the input with its counterpart from a different example, expecting an attentive model to change its prediction.
We show that GPT3 becomes less attentive with an increased number of demonstrations, while its accuracy on the test data improves.
arXiv Detail & Related papers (2023-11-16T06:27:35Z) - Linked shrinkage to improve estimation of interaction effects in
regression models [0.0]
We develop an estimator that adapts well to two-way interaction terms in a regression model.
We evaluate the potential of the model for inference, which is notoriously hard for selection strategies.
Our models can be very competitive to a more advanced machine learner, like random forest, even for fairly large sample sizes.
arXiv Detail & Related papers (2023-09-25T10:03:39Z) - Stubborn Lexical Bias in Data and Models [50.79738900885665]
We use a new statistical method to examine whether spurious patterns in data appear in models trained on the data.
We apply an optimization approach to *reweight* the training data, reducing thousands of spurious correlations.
Surprisingly, though this method can successfully reduce lexical biases in the training data, we still find strong evidence of corresponding bias in the trained models.
arXiv Detail & Related papers (2023-06-03T20:12:27Z) - Gender Biases in Automatic Evaluation Metrics for Image Captioning [87.15170977240643]
We conduct a systematic study of gender biases in model-based evaluation metrics for image captioning tasks.
We demonstrate the negative consequences of using these biased metrics, including the inability to differentiate between biased and unbiased generations.
We present a simple and effective way to mitigate the metric bias without hurting the correlations with human judgments.
arXiv Detail & Related papers (2023-05-24T04:27:40Z) - A Correlation-Ratio Transfer Learning and Variational Stein's Paradox [7.652701739127332]
This paper introduces a new strategy, linear correlation-ratio, to build an accurate relationship between the models.
On the practical side, the new framework is applied to some application scenarios, especially the areas of data streams and medical studies.
arXiv Detail & Related papers (2022-06-10T01:59:16Z) - Correlation inference attacks against machine learning models [6.805105137455252]
We explore correlation inference attacks, whether and when a model leaks information about the correlations between its input variables.
Our results raise fundamental questions on what a model does and should remember from its training set.
arXiv Detail & Related papers (2021-12-16T11:42:45Z) - Few-shot Visual Reasoning with Meta-analogical Contrastive Learning [141.2562447971]
We propose to solve a few-shot (or low-shot) visual reasoning problem, by resorting to analogical reasoning.
We extract structural relationships between elements in both domains, and enforce them to be as similar as possible with analogical learning.
We validate our method on RAVEN dataset, on which it outperforms state-of-the-art method, with larger gains when the training data is scarce.
arXiv Detail & Related papers (2020-07-23T14:00:34Z) - Detecting Human-Object Interactions with Action Co-occurrence Priors [108.31956827512376]
A common problem in human-object interaction (HOI) detection task is that numerous HOI classes have only a small number of labeled examples.
We observe that there exist natural correlations and anti-correlations among human-object interactions.
We present techniques to learn these priors and leverage them for more effective training, especially in rare classes.
arXiv Detail & Related papers (2020-07-17T02:47:45Z) - An Investigation of Why Overparameterization Exacerbates Spurious
Correlations [98.3066727301239]
We identify two key properties of the training data that drive this behavior.
We show how the inductive bias of models towards "memorizing" fewer examples can cause over parameterization to hurt.
arXiv Detail & Related papers (2020-05-09T01:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.