Nichelle and Nancy: The Influence of Demographic Attributes and
Tokenization Length on First Name Biases
- URL: http://arxiv.org/abs/2305.16577v1
- Date: Fri, 26 May 2023 01:57:42 GMT
- Title: Nichelle and Nancy: The Influence of Demographic Attributes and
Tokenization Length on First Name Biases
- Authors: Haozhe An, Rachel Rudinger
- Abstract summary: We find that demographic attributes of a name (race, ethnicity, and gender) and name tokenization length are both factors that systematically affect the behavior of social commonsense reasoning models.
- Score: 12.459949725707315
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Through the use of first name substitution experiments, prior research has
demonstrated the tendency of social commonsense reasoning models to
systematically exhibit social biases along the dimensions of race, ethnicity,
and gender (An et al., 2023). Demographic attributes of first names, however,
are strongly correlated with corpus frequency and tokenization length, which
may influence model behavior independent of or in addition to demographic
factors. In this paper, we conduct a new series of first name substitution
experiments that measures the influence of these factors while controlling for
the others. We find that demographic attributes of a name (race, ethnicity, and
gender) and name tokenization length are both factors that systematically
affect the behavior of social commonsense reasoning models.
Related papers
- On the Influence of Gender and Race in Romantic Relationship Prediction from Large Language Models [21.178861746240507]
We study the presence of heteronormative biases and prejudice against interracial romantic relationships in large language models.
We show that models are less likely to predict romantic relationships for (a) same-gender character pairs than different-gender pairs; and (b) intra/inter-racial character pairs involving Asian names as compared to Black, Hispanic, or White names.
arXiv Detail & Related papers (2024-10-05T01:41:55Z) - The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention [61.80236015147771]
We quantify the trade-off between using diversity interventions and preserving demographic factuality in T2I models.
Experiments on DoFaiR reveal that diversity-oriented instructions increase the number of different gender and racial groups.
We propose Fact-Augmented Intervention (FAI) to reflect on verbalized or retrieved factual information about gender and racial compositions of generation subjects in history.
arXiv Detail & Related papers (2024-06-29T09:09:42Z) - Stop! In the Name of Flaws: Disentangling Personal Names and Sociodemographic Attributes in NLP [17.738887765065396]
We present an interdisciplinary background on names and naming.
We then survey the issues inherent to associating names with sociodemographic attributes.
We provide guiding questions along with normative recommendations to avoid validity and ethical pitfalls.
arXiv Detail & Related papers (2024-05-27T13:33:29Z) - Uncovering Name-Based Biases in Large Language Models Through Simulated Trust Game [0.0]
Gender and race inferred from an individual's name are a notable source of stereotypes and biases that subtly influence social interactions.
We show that our approach can detect name-based biases in both base and instruction-tuned models.
arXiv Detail & Related papers (2024-04-23T02:21:17Z) - What's in a Name? Auditing Large Language Models for Race and Gender
Bias [49.28899492966893]
We employ an audit design to investigate biases in state-of-the-art large language models, including GPT-4.
We find that the advice systematically disadvantages names that are commonly associated with racial minorities and women.
arXiv Detail & Related papers (2024-02-21T18:25:25Z) - Decoding Susceptibility: Modeling Misbelief to Misinformation Through a Computational Approach [61.04606493712002]
Susceptibility to misinformation describes the degree of belief in unverifiable claims that is not observable.
Existing susceptibility studies heavily rely on self-reported beliefs.
We propose a computational approach to model users' latent susceptibility levels.
arXiv Detail & Related papers (2023-11-16T07:22:56Z) - Sensitivity, Performance, Robustness: Deconstructing the Effect of
Sociodemographic Prompting [64.80538055623842]
sociodemographic prompting is a technique that steers the output of prompt-based models towards answers that humans with specific sociodemographic profiles would give.
We show that sociodemographic information affects model predictions and can be beneficial for improving zero-shot learning in subjective NLP tasks.
arXiv Detail & Related papers (2023-09-13T15:42:06Z) - CIParsing: Unifying Causality Properties into Multiple Human Parsing [82.32620538918812]
Existing methods of multiple human parsing (MHP) apply statistical models to acquire underlying associations between images and labeled body parts.
We present a causality inspired parsing paradigm termed CIParsing, which follows fundamental causal principles involving two causal properties for human parsing.
The CIParsing is designed in a plug-and-play fashion and can be integrated into any existing MHP models.
arXiv Detail & Related papers (2023-08-23T15:56:26Z) - Examining the Causal Effect of First Names on Language Models: The Case
of Social Commonsense Reasoning [2.013330800976407]
First names may serve as proxies for socio-demographic representations.
We study whether a model's reasoning given a specific input differs based on the first names provided.
arXiv Detail & Related papers (2023-06-01T20:05:05Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Assessing Demographic Bias in Named Entity Recognition [0.21485350418225244]
We assess the bias in Named Entity Recognition systems for English across different demographic groups with synthetically generated corpora.
Character-based contextualized word representation models such as ELMo results in the least bias across demographics.
arXiv Detail & Related papers (2020-08-08T02:01:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.