Socially Aware Bias Measurements for Hindi Language Representations
- URL: http://arxiv.org/abs/2110.07871v1
- Date: Fri, 15 Oct 2021 05:49:15 GMT
- Title: Socially Aware Bias Measurements for Hindi Language Representations
- Authors: Vijit Malik, Sunipa Dev, Akihiro Nishi, Nanyun Peng and Kai-Wei Chang
- Abstract summary: We show how biases are unique to specific language representations based on the history and culture of the region they are widely spoken in.
We emphasize on the necessity of social-awareness along with linguistic and grammatical artefacts when modeling language representations.
- Score: 38.40818373580979
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Language representations are an efficient tool used across NLP, but they are
strife with encoded societal biases. These biases are studied extensively, but
with a primary focus on English language representations and biases common in
the context of Western society. In this work, we investigate the biases present
in Hindi language representations such as caste and religion associated biases.
We demonstrate how biases are unique to specific language representations based
on the history and culture of the region they are widely spoken in, and also
how the same societal bias (such as binary gender associated biases) when
investigated across languages is encoded by different words and text spans.
With this work, we emphasize on the necessity of social-awareness along with
linguistic and grammatical artefacts when modeling language representations, in
order to understand the biases encoded.
Related papers
- Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - Global Voices, Local Biases: Socio-Cultural Prejudices across Languages [22.92083941222383]
Human biases are ubiquitous but not uniform; disparities exist across linguistic, cultural, and societal borders.
In this work, we scale the Word Embedding Association Test (WEAT) to 24 languages, enabling broader studies.
To encompass more widely prevalent societal biases, we examine new bias dimensions across toxicity, ableism, and more.
arXiv Detail & Related papers (2023-10-26T17:07:50Z) - Evaluating Biased Attitude Associations of Language Models in an
Intersectional Context [2.891314299138311]
Language models are trained on large-scale corpora that embed implicit biases documented in psychology.
We study biases related to age, education, gender, height, intelligence, literacy, race, religion, sex, sexual orientation, social class, and weight.
We find that language models exhibit the most biased attitudes against gender identity, social class, and sexual orientation signals in language.
arXiv Detail & Related papers (2023-07-07T03:01:56Z) - Comparing Biases and the Impact of Multilingual Training across Multiple
Languages [70.84047257764405]
We present a bias analysis across Italian, Chinese, English, Hebrew, and Spanish on the downstream sentiment analysis task.
We adapt existing sentiment bias templates in English to Italian, Chinese, Hebrew, and Spanish for four attributes: race, religion, nationality, and gender.
Our results reveal similarities in bias expression such as favoritism of groups that are dominant in each language's culture.
arXiv Detail & Related papers (2023-05-18T18:15:07Z) - An Analysis of Social Biases Present in BERT Variants Across Multiple
Languages [0.0]
We investigate the bias present in monolingual BERT models across a diverse set of languages.
We propose a template-based method to measure any kind of bias, based on sentence pseudo-likelihood.
We conclude that current methods of probing for bias are highly language-dependent.
arXiv Detail & Related papers (2022-11-25T23:38:08Z) - Efficient Gender Debiasing of Pre-trained Indic Language Models [0.0]
The gender bias present in the data on which language models are pre-trained gets reflected in the systems that use these models.
In our paper, we measure gender bias associated with occupations in Hindi language models.
Our results reflect that the bias is reduced post-introduction of our proposed mitigation techniques.
arXiv Detail & Related papers (2022-09-08T09:15:58Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - Evaluating Gender Bias in Hindi-English Machine Translation [0.1503974529275767]
We implement a modified version of the TGBI metric based on the grammatical considerations for Hindi.
We compare and contrast the resulting bias measurements across multiple metrics for pre-trained embeddings and the ones learned by our machine translation model.
arXiv Detail & Related papers (2021-06-16T10:35:51Z) - Towards Debiasing Sentence Representations [109.70181221796469]
We show that Sent-Debias is effective in removing biases, and at the same time, preserves performance on sentence-level downstream tasks.
We hope that our work will inspire future research on characterizing and removing social biases from widely adopted sentence representations for fairer NLP.
arXiv Detail & Related papers (2020-07-16T04:22:30Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.