Marked Personas: Using Natural Language Prompts to Measure Stereotypes
in Language Models
- URL: http://arxiv.org/abs/2305.18189v1
- Date: Mon, 29 May 2023 16:29:22 GMT
- Title: Marked Personas: Using Natural Language Prompts to Measure Stereotypes
in Language Models
- Authors: Myra Cheng, Esin Durmus, Dan Jurafsky
- Abstract summary: We present Marked Personas, a prompt-based method to measure stereotypes in large language models (LLMs)
We find that portrayals generated by GPT-3.5 and GPT-4 contain higher rates of racial stereotypes than human-written portrayals using the same prompts.
An intersectional lens reveals tropes that dominate portrayals of marginalized groups, such as tropicalism and the hypersexualization of minoritized women.
- Score: 33.157279170602784
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To recognize and mitigate harms from large language models (LLMs), we need to
understand the prevalence and nuances of stereotypes in LLM outputs. Toward
this end, we present Marked Personas, a prompt-based method to measure
stereotypes in LLMs for intersectional demographic groups without any lexicon
or data labeling. Grounded in the sociolinguistic concept of markedness (which
characterizes explicitly linguistically marked categories versus unmarked
defaults), our proposed method is twofold: 1) prompting an LLM to generate
personas, i.e., natural language descriptions, of the target demographic group
alongside personas of unmarked, default groups; 2) identifying the words that
significantly distinguish personas of the target group from corresponding
unmarked ones. We find that the portrayals generated by GPT-3.5 and GPT-4
contain higher rates of racial stereotypes than human-written portrayals using
the same prompts. The words distinguishing personas of marked (non-white,
non-male) groups reflect patterns of othering and exoticizing these
demographics. An intersectional lens further reveals tropes that dominate
portrayals of marginalized groups, such as tropicalism and the
hypersexualization of minoritized women. These representational harms have
concerning implications for downstream applications like story generation.
Related papers
- Who is better at math, Jenny or Jingzhen? Uncovering Stereotypes in Large Language Models [9.734705470760511]
We use GlobalBias to study a broad set of stereotypes from around the world.
We generate character profiles based on given names and evaluate the prevalence of stereotypes in model outputs.
arXiv Detail & Related papers (2024-07-09T14:52:52Z) - White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs [58.27353205269664]
Language agency is an important aspect of evaluating social biases in texts.
Previous research often relies on string-matching techniques to identify agentic and communal words.
We introduce the novel Language Agency Bias Evaluation benchmark.
arXiv Detail & Related papers (2024-04-16T12:27:54Z) - Laissez-Faire Harms: Algorithmic Biases in Generative Language Models [0.0]
We show that synthetically generated texts from five of the most pervasive LMs perpetuate harms of omission, subordination, and stereotyping for minoritized individuals.
We find widespread evidence of bias to an extent that such individuals are hundreds to thousands of times more likely to encounter LM-generated outputs.
Our findings highlight the urgent need to protect consumers from discriminatory harms caused by language models.
arXiv Detail & Related papers (2024-04-11T05:09:03Z) - Large language models cannot replace human participants because they
cannot portray identity groups [40.865099955752825]
We argue that large language models (LLMs) are doomed to both misportray and flatten the representations of demographic groups.
We discuss a third consideration about how identity prompts can essentialize identities.
Overall, we urge caution in use cases where LLMs are intended to replace human participants whose identities are relevant to the task at hand.
arXiv Detail & Related papers (2024-02-02T21:21:06Z) - Aligning with Whom? Large Language Models Have Gender and Racial Biases
in Subjective NLP Tasks [15.015148115215315]
We conduct experiments on four popular large language models (LLMs) to investigate their capability to understand group differences and potential biases in their predictions for politeness and offensiveness.
We find that for both tasks, model predictions are closer to the labels from White and female participants.
More specifically, when being prompted to respond from the perspective of "Black" and "Asian" individuals, models show lower performance in predicting both overall scores as well as the scores from corresponding groups.
arXiv Detail & Related papers (2023-11-16T10:02:24Z) - On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented.
Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z) - Probing Explicit and Implicit Gender Bias through LLM Conditional Text
Generation [64.79319733514266]
Large Language Models (LLMs) can generate biased and toxic responses.
We propose a conditional text generation mechanism without the need for predefined gender phrases and stereotypes.
arXiv Detail & Related papers (2023-11-01T05:31:46Z) - "Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in
LLM-Generated Reference Letters [97.11173801187816]
Large Language Models (LLMs) have recently emerged as an effective tool to assist individuals in writing various types of content.
This paper critically examines gender biases in LLM-generated reference letters.
arXiv Detail & Related papers (2023-10-13T16:12:57Z) - Queer People are People First: Deconstructing Sexual Identity
Stereotypes in Large Language Models [3.974379576408554]
Large Language Models (LLMs) are trained primarily on minimally processed web text.
LLMs can inadvertently perpetuate stereotypes towards marginalized groups, like the LGBTQIA+ community.
arXiv Detail & Related papers (2023-06-30T19:39:01Z) - Easily Accessible Text-to-Image Generation Amplifies Demographic
Stereotypes at Large Scale [61.555788332182395]
We investigate the potential for machine learning models to amplify dangerous and complex stereotypes.
We find a broad range of ordinary prompts produce stereotypes, including prompts simply mentioning traits, descriptors, occupations, or objects.
arXiv Detail & Related papers (2022-11-07T18:31:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.