Using Natural Sentences for Understanding Biases in Language Models
- URL: http://arxiv.org/abs/2205.06303v1
- Date: Thu, 12 May 2022 18:36:33 GMT
- Title: Using Natural Sentences for Understanding Biases in Language Models
- Authors: Sarah Alnegheimish, Alicia Guo, Yi Sun
- Abstract summary: We create a prompt dataset with respect to occupations collected from real-world natural sentences in Wikipedia.
We find bias evaluations are very sensitive to the design choices of template prompts.
We propose using natural sentence prompts for systematic evaluations to step away from design choices that could introduce bias in the observations.
- Score: 10.604991889372092
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Evaluation of biases in language models is often limited to synthetically
generated datasets. This dependence traces back to the need for a prompt-style
dataset to trigger specific behaviors of language models. In this paper, we
address this gap by creating a prompt dataset with respect to occupations
collected from real-world natural sentences present in Wikipedia. We aim to
understand the differences between using template-based prompts and natural
sentence prompts when studying gender-occupation biases in language models. We
find bias evaluations are very sensitive to the design choices of template
prompts, and we propose using natural sentence prompts for systematic
evaluations to step away from design choices that could introduce bias in the
observations.
Related papers
- Learning Phonotactics from Linguistic Informants [54.086544221761486]
Our model iteratively selects or synthesizes a data-point according to one of a range of information-theoretic policies.
We find that the information-theoretic policies that our model uses to select items to query the informant achieve sample efficiency comparable to, or greater than, fully supervised approaches.
arXiv Detail & Related papers (2024-05-08T00:18:56Z) - Detecting Natural Language Biases with Prompt-based Learning [0.3749861135832073]
We explore how to design prompts that can indicate 4 different types of biases: (1) gender, (2) race, (3) sexual orientation, and (4) religion-based.
We apply these prompts to multiple variations of popular and well-recognized models: BERT, RoBERTa, and T5 to evaluate their biases.
We provide a comparative analysis of these models and assess them using a two-fold method: use human judgment to decide whether model predictions are biased and utilize model-level judgment (through further prompts) to understand if a model can self-diagnose the biases of its own prediction.
arXiv Detail & Related papers (2023-09-11T04:20:36Z) - Exploiting Biased Models to De-bias Text: A Gender-Fair Rewriting Model [32.21372089380992]
We train a rewriting model for German without the need for elaborate handcrafted rules.
The outputs of this model increased gender-fairness as shown in a human evaluation study.
arXiv Detail & Related papers (2023-05-18T17:35:28Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Transparency Helps Reveal When Language Models Learn Meaning [71.96920839263457]
Our systematic experiments with synthetic data reveal that, with languages where all expressions have context-independent denotations, both autoregressive and masked language models learn to emulate semantic relations between expressions.
Turning to natural language, our experiments with a specific phenomenon -- referential opacity -- add to the growing body of evidence that current language models do not well-represent natural language semantics.
arXiv Detail & Related papers (2022-10-14T02:35:19Z) - Challenges in Measuring Bias via Open-Ended Language Generation [1.5552869983952944]
We analyze how specific choices of prompt sets, metrics, automatic tools and sampling strategies affect bias results.
We provide recommendations for reporting biases in open-ended language generation.
arXiv Detail & Related papers (2022-05-23T19:57:15Z) - Automatically Identifying Semantic Bias in Crowdsourced Natural Language
Inference Datasets [78.6856732729301]
We introduce a model-driven, unsupervised technique to find "bias clusters" in a learned embedding space of hypotheses in NLI datasets.
interventions and additional rounds of labeling can be performed to ameliorate the semantic bias of the hypothesis distribution of a dataset.
arXiv Detail & Related papers (2021-12-16T22:49:01Z) - Language Model Evaluation Beyond Perplexity [47.268323020210175]
We analyze whether text generated from language models exhibits the statistical tendencies present in the human-generated text on which they were trained.
We find that neural language models appear to learn only a subset of the tendencies considered, but align much more closely with empirical trends than proposed theoretical distributions.
arXiv Detail & Related papers (2021-05-31T20:13:44Z) - The Authors Matter: Understanding and Mitigating Implicit Bias in Deep
Text Classification [36.361778457307636]
Deep text classification models can produce biased outcomes for texts written by authors of certain demographic groups.
In this paper, we first demonstrate that implicit bias exists in different text classification tasks for different demographic groups.
We then build a learning-based interpretation method to deepen our knowledge of implicit bias.
arXiv Detail & Related papers (2021-05-06T16:17:38Z) - Unnatural Language Inference [48.45003475966808]
We find that state-of-the-art NLI models, such as RoBERTa and BART, are invariant to, and sometimes even perform better on, examples with randomly reordered words.
Our findings call into question the idea that our natural language understanding models, and the tasks used for measuring their progress, genuinely require a human-like understanding of syntax.
arXiv Detail & Related papers (2020-12-30T20:40:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.