Analyzing Gender Representation in Multilingual Models
- URL: http://arxiv.org/abs/2204.09168v1
- Date: Wed, 20 Apr 2022 00:13:01 GMT
- Title: Analyzing Gender Representation in Multilingual Models
- Authors: Hila Gonen, Shauli Ravfogel and Yoav Goldberg
- Abstract summary: We focus on the representation of gender distinctions as a practical case study.
We examine the extent to which the gender concept is encoded in shared subspaces across different languages.
- Score: 59.21915055702203
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multilingual language models were shown to allow for nontrivial transfer
across scripts and languages. In this work, we study the structure of the
internal representations that enable this transfer. We focus on the
representation of gender distinctions as a practical case study, and examine
the extent to which the gender concept is encoded in shared subspaces across
different languages. Our analysis shows that gender representations consist of
several prominent components that are shared across languages, alongside
language-specific components. The existence of language-independent and
language-specific components provides an explanation for an intriguing
empirical observation we make: while gender classification transfers well
across languages, interventions for gender removal, trained on a single
language, do not transfer easily to others.
Related papers
- What an Elegant Bridge: Multilingual LLMs are Biased Similarly in Different Languages [51.0349882045866]
This paper investigates biases of Large Language Models (LLMs) through the lens of grammatical gender.
We prompt a model to describe nouns with adjectives in various languages, focusing specifically on languages with grammatical gender.
We find that a simple classifier can not only predict noun gender above chance but also exhibit cross-language transferability.
arXiv Detail & Related papers (2024-07-12T22:10:16Z) - Leveraging Large Language Models to Measure Gender Bias in Gendered Languages [9.959039325564744]
This paper introduces a novel methodology that leverages the contextual understanding capabilities of large language models (LLMs) to quantitatively analyze gender representation in Spanish corpora.
We empirically validate our method on four widely-used benchmark datasets, uncovering significant gender disparities with a male-to-female ratio ranging from 4:01.
arXiv Detail & Related papers (2024-06-19T16:30:58Z) - Gender Lost In Translation: How Bridging The Gap Between Languages
Affects Gender Bias in Zero-Shot Multilingual Translation [12.376309678270275]
bridging the gap between languages for which parallel data is not available affects gender bias in multilingual NMT.
We study the effect of encouraging language-agnostic hidden representations on models' ability to preserve gender.
We find that language-agnostic representations mitigate zero-shot models' masculine bias, and with increased levels of gender inflection in the bridge language, pivoting surpasses zero-shot translation regarding fairer gender preservation for speaker-related gender agreement.
arXiv Detail & Related papers (2023-05-26T13:51:50Z) - Cross-Lingual Ability of Multilingual Masked Language Models: A Study of
Language Structure [54.01613740115601]
We study three language properties: constituent order, composition and word co-occurrence.
Our main conclusion is that the contribution of constituent order and word co-occurrence is limited, while the composition is more crucial to the success of cross-linguistic transfer.
arXiv Detail & Related papers (2022-03-16T07:09:35Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Pick a Fight or Bite your Tongue: Investigation of Gender Differences in
Idiomatic Language Usage [9.892162266128306]
We compile a novel, large and diverse corpus of spontaneous linguistic productions annotated with speakers' gender.
We perform a first large-scale empirical study of distinctions in the usage of textitfigurative language between male and female authors.
arXiv Detail & Related papers (2020-10-31T18:44:07Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.