Word Embeddings Track Social Group Changes Across 70 Years in China
- URL: http://arxiv.org/abs/2504.12327v1
- Date: Sat, 12 Apr 2025 03:46:13 GMT
- Title: Word Embeddings Track Social Group Changes Across 70 Years in China
- Authors: Yuxi Ma, Yongqian Peng, Yixin Zhu,
- Abstract summary: We study how revolutionary social transformations are reflected in official linguistic representations of social groups in China.<n>Using diachronic word embeddings at multiple temporal resolutions, we find that Chinese representations differ significantly from Western counterparts.
- Score: 24.439801610491646
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Language encodes societal beliefs about social groups through word patterns. While computational methods like word embeddings enable quantitative analysis of these patterns, studies have primarily examined gradual shifts in Western contexts. We present the first large-scale computational analysis of Chinese state-controlled media (1950-2019) to examine how revolutionary social transformations are reflected in official linguistic representations of social groups. Using diachronic word embeddings at multiple temporal resolutions, we find that Chinese representations differ significantly from Western counterparts, particularly regarding economic status, ethnicity, and gender. These representations show distinct evolutionary dynamics: while stereotypes of ethnicity, age, and body type remain remarkably stable across political upheavals, representations of gender and economic classes undergo dramatic shifts tracking historical transformations. This work advances our understanding of how officially sanctioned discourse encodes social structure through language while highlighting the importance of non-Western perspectives in computational social science.
Related papers
- Human-like conceptual representations emerge from language prediction [72.5875173689788]
Large language models (LLMs) trained exclusively through next-token prediction over language data exhibit remarkably human-like behaviors.
Are these models developing concepts akin to humans, and if so, how are such concepts represented and organized?
Our results demonstrate that LLMs can flexibly derive concepts from linguistic descriptions in relation to contextual cues about other concepts.
These findings establish that structured, human-like conceptual representations can naturally emerge from language prediction without real-world grounding.
arXiv Detail & Related papers (2025-01-21T23:54:17Z) - Large Language Models Reflect the Ideology of their Creators [71.65505524599888]
Large language models (LLMs) are trained on vast amounts of data to generate natural language.<n>This paper shows that the ideological stance of an LLM appears to reflect the worldview of its creators.
arXiv Detail & Related papers (2024-10-24T04:02:30Z) - Towards "Differential AI Psychology" and in-context Value-driven Statement Alignment with Moral Foundations Theory [0.0]
This work investigates the alignment between personalized language models and survey participants on a Moral Foundation questionnaire.
We adapt text-to-text models to different political personas and survey the questionnaire repetitively to generate a synthetic population of persona and model combinations.
Our findings indicate that adapted models struggle to represent the survey-leading assessment of political ideologies.
arXiv Detail & Related papers (2024-08-21T08:20:41Z) - Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models [11.132360309354782]
Social bias is shaped by the accumulation of social perceptions towards targets across various demographic identities.
We propose a novel strategy to intuitively quantify social perceptions and suggest metrics that can evaluate the social biases within large language models.
arXiv Detail & Related papers (2024-06-06T13:32:09Z) - Survey in Characterization of Semantic Change [0.1474723404975345]
Understanding the meaning of words is vital for interpreting texts from different cultures.
Semantic changes can potentially impact the quality of the outcomes of computational linguistics algorithms.
arXiv Detail & Related papers (2024-02-29T12:13:50Z) - Sociocultural Norm Similarities and Differences via Situational
Alignment and Explainable Textual Entailment [31.929550141633218]
We propose a novel approach to discover and compare social norms across Chinese and American cultures.
We build a high-quality dataset of 3,069 social norms aligned with social situations across Chinese and American cultures.
To test the ability of models to reason about social norms across cultures, we introduce the task of explainable social norm entailment.
arXiv Detail & Related papers (2023-05-23T19:43:47Z) - Logic Against Bias: Textual Entailment Mitigates Stereotypical Sentence
Reasoning [8.990338162517086]
We describe several kinds of stereotypes concerning different communities that are present in popular sentence representation models.
By comparing strong pretrained models based on text similarity with textual entailment learning, we conclude that the explicit logic learning with textual entailment can significantly reduce bias.
arXiv Detail & Related papers (2023-03-10T02:52:13Z) - Analyzing Gender Representation in Multilingual Models [59.21915055702203]
We focus on the representation of gender distinctions as a practical case study.
We examine the extent to which the gender concept is encoded in shared subspaces across different languages.
arXiv Detail & Related papers (2022-04-20T00:13:01Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - Towards Debiasing Sentence Representations [109.70181221796469]
We show that Sent-Debias is effective in removing biases, and at the same time, preserves performance on sentence-level downstream tasks.
We hope that our work will inspire future research on characterizing and removing social biases from widely adopted sentence representations for fairer NLP.
arXiv Detail & Related papers (2020-07-16T04:22:30Z) - A Framework for the Computational Linguistic Analysis of Dehumanization [52.735780962665814]
We analyze discussions of LGBTQ people in the New York Times from 1986 to 2015.
We find increasingly humanizing descriptions of LGBTQ people over time.
The ability to analyze dehumanizing language at a large scale has implications for automatically detecting and understanding media bias as well as abusive language online.
arXiv Detail & Related papers (2020-03-06T03:02:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.