I Am Not Them: Fluid Identities and Persistent Out-group Bias in Large
Language Models
- URL: http://arxiv.org/abs/2402.10436v1
- Date: Fri, 16 Feb 2024 03:54:48 GMT
- Title: I Am Not Them: Fluid Identities and Persistent Out-group Bias in Large
Language Models
- Authors: Wenchao Dong, Assem Zhunis, Hyojin Chin, Jiyoung Han, Meeyoung Cha
- Abstract summary: We explored cultural biases-individualism vs. collectivism-in ChatGPT across three Western languages (i.e., English, German, and French) and three Eastern languages (i.e., Chinese, Japanese, and Korean)
When ChatGPT adopted an individualistic persona in Western languages, its collectivism scores (i.e., out-group values) exhibited a more negative trend.
Conversely, when a collectivistic persona was assigned to ChatGPT in Eastern languages, a similar pattern emerged with more negative responses toward individualism (i.e., out-group values) as
- Score: 11.77633171656753
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We explored cultural biases-individualism vs. collectivism-in ChatGPT across
three Western languages (i.e., English, German, and French) and three Eastern
languages (i.e., Chinese, Japanese, and Korean). When ChatGPT adopted an
individualistic persona in Western languages, its collectivism scores (i.e.,
out-group values) exhibited a more negative trend, surpassing their positive
orientation towards individualism (i.e., in-group values). Conversely, when a
collectivistic persona was assigned to ChatGPT in Eastern languages, a similar
pattern emerged with more negative responses toward individualism (i.e.,
out-group values) as compared to collectivism (i.e., in-group values). The
results indicate that when imbued with a particular social identity, ChatGPT
discerns in-group and out-group, embracing in-group values while eschewing
out-group values. Notably, the negativity towards the out-group, from which
prejudices and discrimination arise, exceeded the positivity towards the
in-group. The experiment was replicated in the political domain, and the
results remained consistent. Furthermore, this replication unveiled an
intrinsic Democratic bias in Large Language Models (LLMs), aligning with
earlier findings and providing integral insights into mitigating such bias
through prompt engineering. Extensive robustness checks were performed using
varying hyperparameter and persona setup methods, with or without social
identity labels, across other popular language models.
Related papers
- Persona Setting Pitfall: Persistent Outgroup Biases in Large Language Models Arising from Social Identity Adoption [10.35915254696156]
We show that outgroup bias manifests as strongly as ingroup favoritism.
Our findings highlight the potential to develop more equitable and balanced language models.
arXiv Detail & Related papers (2024-09-05T18:08:47Z) - Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models [78.69526166193236]
Pre-trained Language models (PLMs) have been acknowledged to contain harmful information, such as social biases.
We propose sc Social Bias Neurons to accurately pinpoint units (i.e., neurons) in a language model that can be attributed to undesirable behavior, such as social bias.
As measured by prior metrics from StereoSet, our model achieves a higher degree of fairness while maintaining language modeling ability with low cost.
arXiv Detail & Related papers (2024-06-14T15:41:06Z) - Large Language Models Portray Socially Subordinate Groups as More Homogeneous, Consistent with a Bias Observed in Humans [0.30723404270319693]
We investigate a new form of bias in large language models (LLMs)
We find that ChatGPT portrayed African, Asian, and Hispanic Americans as more homogeneous than White Americans.
We argue that the tendency to describe groups as less diverse risks perpetuating stereotypes and discriminatory behavior.
arXiv Detail & Related papers (2024-01-16T16:52:00Z) - Social Bias Probing: Fairness Benchmarking for Language Models [38.180696489079985]
This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment.
We curate SoFa, a large-scale benchmark designed to address the limitations of existing fairness collections.
We show that biases within language models are more nuanced than acknowledged, indicating a broader scope of encoded biases than previously recognized.
arXiv Detail & Related papers (2023-11-15T16:35:59Z) - On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented.
Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z) - Generative Language Models Exhibit Social Identity Biases [17.307292780517653]
We investigate whether ingroup solidarity and outgroup hostility, fundamental social identity biases, are present in 56 large language models.
We find that almost all foundational language models and some instruction fine-tuned models exhibit clear ingroup-positive and outgroup-negative associations when prompted to complete sentences.
Our findings suggest that modern language models exhibit fundamental social identity biases to a similar degree as humans.
arXiv Detail & Related papers (2023-10-24T13:17:40Z) - Evaluating Biased Attitude Associations of Language Models in an
Intersectional Context [2.891314299138311]
Language models are trained on large-scale corpora that embed implicit biases documented in psychology.
We study biases related to age, education, gender, height, intelligence, literacy, race, religion, sex, sexual orientation, social class, and weight.
We find that language models exhibit the most biased attitudes against gender identity, social class, and sexual orientation signals in language.
arXiv Detail & Related papers (2023-07-07T03:01:56Z) - Comparing Biases and the Impact of Multilingual Training across Multiple
Languages [70.84047257764405]
We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task.
We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender.
Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
arXiv Detail & Related papers (2023-05-18T18:15:07Z) - On Negative Interference in Multilingual Models: Findings and A
Meta-Learning Treatment [59.995385574274785]
We show that, contrary to previous belief, negative interference also impacts low-resource languages.
We present a meta-learning algorithm that obtains better cross-lingual transferability and alleviates negative interference.
arXiv Detail & Related papers (2020-10-06T20:48:58Z) - Towards Controllable Biases in Language Generation [87.89632038677912]
We develop a method to induce societal biases in generated text when input prompts contain mentions of specific demographic groups.
We analyze two scenarios: 1) inducing negative biases for one demographic and positive biases for another demographic, and 2) equalizing biases between demographics.
arXiv Detail & Related papers (2020-05-01T08:25:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.