Investigating Cross-Linguistic Gender Bias in Hindi-English Across
Domains
- URL: http://arxiv.org/abs/2111.11159v1
- Date: Mon, 22 Nov 2021 12:55:36 GMT
- Title: Investigating Cross-Linguistic Gender Bias in Hindi-English Across
Domains
- Authors: Somya Khosla
- Abstract summary: We aim to measure and study this bias in Hindi language, which is a higher-order language (gendered) with reference to English, a lower-order language.
To achieve this, we study the variations across domains to quantify if domain embeddings allow us some insight into Gender bias for this pair of Hindi-English model.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Measuring, evaluating and reducing Gender Bias has come to the forefront with
newer and improved language embeddings being released every few months. But
could this bias vary from domain to domain? We see a lot of work to study these
biases in various embedding models but limited work has been done to debias
Indic languages. We aim to measure and study this bias in Hindi language, which
is a higher-order language (gendered) with reference to English, a lower-order
language. To achieve this, we study the variations across domains to quantify
if domain embeddings allow us some insight into Gender bias for this pair of
Hindi-English model. We will generate embeddings in four different corpora and
compare results by implementing different metrics like with pre-trained State
of the Art Indic-English translation model, which has performed better at many
NLP tasks than existing models.
Related papers
- Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - What is Your Favorite Gender, MLM? Gender Bias Evaluation in Multilingual Masked Language Models [8.618945530676614]
This paper proposes an approach to estimate gender bias in multilingual lexicons from 5 languages: Chinese, English, German, Portuguese, and Spanish.
A novel model-based method is presented to generate sentence pairs for a more robust analysis of gender bias.
Our results suggest that gender bias should be studied on a large dataset using multiple evaluation metrics for best practice.
arXiv Detail & Related papers (2024-04-09T21:12:08Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - Comparing Biases and the Impact of Multilingual Training across Multiple
Languages [70.84047257764405]
We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task.
We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender.
Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
arXiv Detail & Related papers (2023-05-18T18:15:07Z) - An Analysis of Social Biases Present in BERT Variants Across Multiple
Languages [0.0]
We investigate the bias present in monolingual BERT models across a diverse set of languages.
We propose a template-based method to measure any kind of bias, based on sentence pseudo-likelihood.
We conclude that current methods of probing for bias are highly language-dependent.
arXiv Detail & Related papers (2022-11-25T23:38:08Z) - Efficient Gender Debiasing of Pre-trained Indic Language Models [0.0]
The gender bias present in the data on which language models are pre-trained gets reflected in the systems that use these models.
In our paper, we measure gender bias associated with occupations in Hindi language models.
Our results reflect that the bias is reduced post-introduction of our proposed mitigation techniques.
arXiv Detail & Related papers (2022-09-08T09:15:58Z) - Evaluating Gender Bias in Hindi-English Machine Translation [0.1503974529275767]
We implement a modified version of the TGBI metric based on the grammatical considerations for Hindi.
We compare and contrast the resulting bias measurements across multiple metrics for pre-trained embeddings and the ones learned by our machine translation model.
arXiv Detail & Related papers (2021-06-16T10:35:51Z) - Quantifying Gender Bias Towards Politicians in Cross-Lingual Language
Models [104.41668491794974]
We quantify the usage of adjectives and verbs generated by language models surrounding the names of politicians as a function of their gender.
We find that while some words such as dead, and designated are associated with both male and female politicians, a few specific words such as beautiful and divorced are predominantly associated with female politicians.
arXiv Detail & Related papers (2021-04-15T15:03:26Z) - Unmasking Contextual Stereotypes: Measuring and Mitigating BERT's Gender
Bias [12.4543414590979]
Contextualized word embeddings have been replacing standard embeddings in NLP systems.
We measure gender bias by studying associations between gender-denoting target words and names of professions in English and German.
We show that our method of measuring bias is appropriate for languages with a rich and gender-marking, such as German.
arXiv Detail & Related papers (2020-10-27T18:06:09Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.