Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer
- URL: http://arxiv.org/abs/2005.00699v1
- Date: Sat, 2 May 2020 04:34:37 GMT
- Title: Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer
- Authors: Jieyu Zhao, Subhabrata Mukherjee, Saghar Hosseini, Kai-Wei Chang and
Ahmed Hassan Awadallah
- Abstract summary: We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
- Score: 101.58431011820755
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual representations embed words from many languages into a single
semantic space such that words with similar meanings are close to each other
regardless of the language. These embeddings have been widely used in various
settings, such as cross-lingual transfer, where a natural language processing
(NLP) model trained on one language is deployed to another language. While the
cross-lingual transfer techniques are powerful, they carry gender bias from the
source to target languages. In this paper, we study gender bias in multilingual
embeddings and how it affects transfer learning for NLP applications. We create
a multilingual dataset for bias analysis and propose several ways for
quantifying bias in multilingual representations from both the intrinsic and
extrinsic perspectives. Experimental results show that the magnitude of bias in
the multilingual representations changes differently when we align the
embeddings to different target spaces and that the alignment direction can also
have an influence on the bias in transfer learning. We further provide
recommendations for using the multilingual word representations for downstream
tasks.
Related papers
- Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in
Multilingual Machine Translation [28.471506840241602]
Gender bias is a significant issue in machine translation, leading to ongoing research efforts in developing bias mitigation techniques.
We propose a bias mitigation method based on a novel approach.
Gender-Aware Contrastive Learning, GACL, encodes contextual gender information into the representations of non-explicit gender words.
arXiv Detail & Related papers (2023-05-23T12:53:39Z) - Cross-lingual Transfer Can Worsen Bias in Sentiment Analysis [12.767209085664247]
We study whether gender or racial biases are imported when using cross-lingual transfer.
We find that systems using cross-lingual transfer usually become more biased than their monolingual counterparts.
We also find racial biases to be much more prevalent than gender biases.
arXiv Detail & Related papers (2023-05-22T04:37:49Z) - Comparing Biases and the Impact of Multilingual Training across Multiple
Languages [70.84047257764405]
We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task.
We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender.
Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
arXiv Detail & Related papers (2023-05-18T18:15:07Z) - Analyzing Gender Representation in Multilingual Models [59.21915055702203]
We focus on the representation of gender distinctions as a practical case study.
We examine the extent to which the gender concept is encoded in shared subspaces across different languages.
arXiv Detail & Related papers (2022-04-20T00:13:01Z) - When is BERT Multilingual? Isolating Crucial Ingredients for
Cross-lingual Transfer [15.578267998149743]
We show that the absence of sub-word overlap significantly affects zero-shot transfer when languages differ in their word order.
There is a strong correlation between transfer performance and word embedding alignment between languages.
Our results call for focus in multilingual models on explicitly improving word embedding alignment between languages.
arXiv Detail & Related papers (2021-10-27T21:25:39Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Bridging Linguistic Typology and Multilingual Machine Translation with
Multi-View Language Representations [83.27475281544868]
We use singular vector canonical correlation analysis to study what kind of information is induced from each source.
We observe that our representations embed typology and strengthen correlations with language relationships.
We then take advantage of our multi-view language vector space for multilingual machine translation, where we achieve competitive overall translation accuracy.
arXiv Detail & Related papers (2020-04-30T16:25:39Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.