ValNorm Quantifies Semantics to Reveal Consistent Valence Biases Across
Languages and Over Centuries
- URL: http://arxiv.org/abs/2006.03950v5
- Date: Mon, 8 Nov 2021 03:47:44 GMT
- Title: ValNorm Quantifies Semantics to Reveal Consistent Valence Biases Across
Languages and Over Centuries
- Authors: Autumn Toney-Wails and Aylin Caliskan
- Abstract summary: Word embeddings learn implicit biases from linguistic regularities captured by word co-occurrence statistics.
By extending methods that quantify human-like biases in word embeddings, we introduceValNorm, a novel intrinsic evaluation task.
We apply ValNorm on static word embeddings from seven languages and historical English text spanning 200 years.
- Score: 3.0349733976070015
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Word embeddings learn implicit biases from linguistic regularities captured
by word co-occurrence statistics. By extending methods that quantify human-like
biases in word embeddings, we introduceValNorm, a novel intrinsic evaluation
task and method to quantify the valence dimension of affect in human-rated word
sets from social psychology. We apply ValNorm on static word embeddings from
seven languages (Chinese, English, German, Polish, Portuguese, Spanish, and
Turkish) and from historical English text spanning 200 years. ValNorm achieves
consistently high accuracy in quantifying the valence of non-discriminatory,
non-social group word sets. Specifically, ValNorm achieves a Pearson
correlation of r=0.88 for human judgment scores of valence for 399 words
collected to establish pleasantness norms in English. In contrast, we measure
gender stereotypes using the same set of word embeddings and find that social
biases vary across languages. Our results indicate that valence associations of
non-discriminatory, non-social group words represent widely-shared
associations, in seven languages and over 200 years.
Related papers
- Tomato, Tomahto, Tomate: Measuring the Role of Shared Semantics among Subwords in Multilingual Language Models [88.07940818022468]
We take an initial step on measuring the role of shared semantics among subwords in the encoder-only multilingual language models (mLMs)
We form "semantic tokens" by merging the semantically similar subwords and their embeddings.
inspections on the grouped subwords show that they exhibit a wide range of semantic similarities.
arXiv Detail & Related papers (2024-11-07T08:38:32Z) - What an Elegant Bridge: Multilingual LLMs are Biased Similarly in Different Languages [51.0349882045866]
This paper investigates biases of Large Language Models (LLMs) through the lens of grammatical gender.
We prompt a model to describe nouns with adjectives in various languages, focusing specifically on languages with grammatical gender.
We find that a simple classifier can not only predict noun gender above chance but also exhibit cross-language transferability.
arXiv Detail & Related papers (2024-07-12T22:10:16Z) - Evaluating Biased Attitude Associations of Language Models in an
Intersectional Context [2.891314299138311]
Language models are trained on large-scale corpora that embed implicit biases documented in psychology.
We study biases related to age, education, gender, height, intelligence, literacy, race, religion, sex, sexual orientation, social class, and weight.
We find that language models exhibit the most biased attitudes against gender identity, social class, and sexual orientation signals in language.
arXiv Detail & Related papers (2023-07-07T03:01:56Z) - Comparing Biases and the Impact of Multilingual Training across Multiple
Languages [70.84047257764405]
We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task.
We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender.
Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
arXiv Detail & Related papers (2023-05-18T18:15:07Z) - An Analysis of Social Biases Present in BERT Variants Across Multiple
Languages [0.0]
We investigate the bias present in monolingual BERT models across a diverse set of languages.
We propose a template-based method to measure any kind of bias, based on sentence pseudo-likelihood.
We conclude that current methods of probing for bias are highly language-dependent.
arXiv Detail & Related papers (2022-11-25T23:38:08Z) - Crosslinguistic word order variation reflects evolutionary pressures of
dependency and information locality [4.869029215261254]
About 40% of the world's languages have subject-verb-object order, and about 40% have subject-object-verb order.
We show that variation in word order reflects different ways of balancing competing pressures of dependency locality and information locality.
Our findings suggest that syntactic structure and usage across languages co-adapt to support efficient communication under limited cognitive resources.
arXiv Detail & Related papers (2022-06-09T02:56:53Z) - Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency,
Syntax, and Semantics [3.4048739113355215]
We provide a comprehensive analysis of group-based biases in widely-used static English word embeddings trained on internet corpora.
Using the Single-Category Word Embedding Association Test, we demonstrate the widespread prevalence of gender biases.
We find that, of the 1,000 most frequent words in the vocabulary, 77% are more associated with men than women.
arXiv Detail & Related papers (2022-06-07T15:35:10Z) - Regional Negative Bias in Word Embeddings Predicts Racial Animus--but
only via Name Frequency [2.247786323899963]
We show that anti-black WEAT estimates from geo-tagged social media data strongly correlate with several measures of racial animus.
We also show that every one of these correlations is explained by the frequency of Black names in the underlying corpora relative to White names.
arXiv Detail & Related papers (2022-01-20T20:52:12Z) - Quantifying Gender Bias Towards Politicians in Cross-Lingual Language
Models [104.41668491794974]
We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender.
We find that while some words such as dead, and designated are associated with both male and female politicians, a few specific words such as beautiful and divorced are predominantly associated with female politicians.
arXiv Detail & Related papers (2021-04-15T15:03:26Z) - Speakers Fill Lexical Semantic Gaps with Context [65.08205006886591]
We operationalise the lexical ambiguity of a word as the entropy of meanings it can take.
We find significant correlations between our estimate of ambiguity and the number of synonyms a word has in WordNet.
This suggests that, in the presence of ambiguity, speakers compensate by making contexts more informative.
arXiv Detail & Related papers (2020-10-05T17:19:10Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.