A Taxonomy of Stereotype Content in Large Language Models
- URL: http://arxiv.org/abs/2408.00162v1
- Date: Wed, 31 Jul 2024 21:14:41 GMT
- Title: A Taxonomy of Stereotype Content in Large Language Models
- Authors: Gandalf Nicolas, Aylin Caliskan,
- Abstract summary: This study introduces a taxonomy of stereotype content in contemporary large language models (LLMs)
We identify 14 stereotype dimensions (e.g., Morality, Ability, Health, Beliefs, Emotions) accounting for 90% of LLM stereotype associations.
Our findings suggest that high-dimensional human stereotypes are reflected in LLMs and must be considered in AI auditing and debiasing to minimize unidentified harms.
- Score: 4.4212441764241
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study introduces a taxonomy of stereotype content in contemporary large language models (LLMs). We prompt ChatGPT 3.5, Llama 3, and Mixtral 8x7B, three powerful and widely used LLMs, for the characteristics associated with 87 social categories (e.g., gender, race, occupations). We identify 14 stereotype dimensions (e.g., Morality, Ability, Health, Beliefs, Emotions), accounting for ~90% of LLM stereotype associations. Warmth and Competence facets were the most frequent content, but all other dimensions were significantly prevalent. Stereotypes were more positive in LLMs (vs. humans), but there was significant variability across categories and dimensions. Finally, the taxonomy predicted the LLMs' internal evaluations of social categories (e.g., how positively/negatively the categories were represented), supporting the relevance of a multidimensional taxonomy for characterizing LLM stereotypes. Our findings suggest that high-dimensional human stereotypes are reflected in LLMs and must be considered in AI auditing and debiasing to minimize unidentified harms from reliance in low-dimensional views of bias in LLMs.
Related papers
- Large Language Models Reflect the Ideology of their Creators [73.25935570218375]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.
We uncover notable diversity in the ideological stance exhibited across different LLMs and languages.
arXiv Detail & Related papers (2024-10-24T04:02:30Z) - Hate Personified: Investigating the role of LLMs in content moderation [64.26243779985393]
For subjective tasks such as hate detection, where people perceive hate differently, the Large Language Model's (LLM) ability to represent diverse groups is unclear.
By including additional context in prompts, we analyze LLM's sensitivity to geographical priming, persona attributes, and numerical information to assess how well the needs of various groups are reflected.
arXiv Detail & Related papers (2024-10-03T16:43:17Z) - Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective [66.34066553400108]
We conduct a rigorous evaluation of Large Language Models' implicit bias towards certain groups by attacking them with carefully crafted instructions to elicit biased responses.
We propose three attack approaches, i.e., Disguise, Deception, and Teaching, based on which we built evaluation datasets for four common bias types.
arXiv Detail & Related papers (2024-06-20T06:42:08Z) - "They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations [15.535416139394009]
Large language models (LLMs) have emerged as an integral part of modern societies.
Despite their utility, research indicates that LLMs perpetuate systemic biases.
We introduce the Covert Harms and Social Threats (CHAST), a set of seven metrics grounded in social science literature.
arXiv Detail & Related papers (2024-05-08T19:08:45Z) - White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs [58.27353205269664]
Social biases can manifest in language agency.
We introduce the novel Language Agency Bias Evaluation benchmark.
We unveil language agency social biases in 3 recent Large Language Model (LLM)-generated content.
arXiv Detail & Related papers (2024-04-16T12:27:54Z) - Exploring Value Biases: How LLMs Deviate Towards the Ideal [57.99044181599786]
Large-Language-Models (LLMs) are deployed in a wide range of applications, and their response has an increasing social impact.
We show that value bias is strong in LLMs across different categories, similar to the results found in human studies.
arXiv Detail & Related papers (2024-02-16T18:28:43Z) - Do LLMs exhibit human-like response biases? A case study in survey
design [66.1850490474361]
We investigate the extent to which large language models (LLMs) reflect human response biases, if at all.
We design a dataset and framework to evaluate whether LLMs exhibit human-like response biases in survey questionnaires.
Our comprehensive evaluation of nine models shows that popular open and commercial LLMs generally fail to reflect human-like behavior.
arXiv Detail & Related papers (2023-11-07T15:40:43Z) - Confronting LLMs with Traditional ML: Rethinking the Fairness of Large Language Models in Tabular Classifications [23.963586791210414]
We show that large language models (LLMs) tend to inherit social biases from their training data which significantly impact their fairness in classification tasks.
This observation emphasizes that the social biases are inherent within the LLMs themselves and inherited from their pretraining corpus.
arXiv Detail & Related papers (2023-10-23T06:31:28Z) - StereoMap: Quantifying the Awareness of Human-like Stereotypes in Large
Language Models [11.218531873222398]
Large Language Models (LLMs) have been observed to encode and perpetuate harmful associations present in the training data.
We propose a theoretically grounded framework called StereoMap to gain insights into their perceptions of how demographic groups have been viewed by society.
arXiv Detail & Related papers (2023-10-20T17:22:30Z) - Investigating Subtler Biases in LLMs: Ageism, Beauty, Institutional, and Nationality Bias in Generative Models [0.0]
This paper investigates bias along less-studied but still consequential, dimensions, such as age and beauty.
We ask whether LLMs hold wide-reaching biases of positive or negative sentiment for specific social groups similar to the "what is beautiful is good" bias found in people in experimental psychology.
arXiv Detail & Related papers (2023-09-16T07:07:04Z) - Marked Personas: Using Natural Language Prompts to Measure Stereotypes
in Language Models [33.157279170602784]
We present Marked Personas, a prompt-based method to measure stereotypes in large language models (LLMs)
We find that portrayals generated by GPT-3.5 and GPT-4 contain higher rates of racial stereotypes than human-written portrayals using the same prompts.
An intersectional lens reveals tropes that dominate portrayals of marginalized groups, such as tropicalism and the hypersexualization of minoritized women.
arXiv Detail & Related papers (2023-05-29T16:29:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.