Related papers: Gender Bias in Text: Labeled Datasets and Lexicons

Gender Bias in Text: Labeled Datasets and Lexicons

URL: http://arxiv.org/abs/2201.08675v1
Date: Fri, 21 Jan 2022 12:44:51 GMT
Title: Gender Bias in Text: Labeled Datasets and Lexicons
Authors: Jad Doughman, Wael Khreich
Abstract summary: There is a lack of gender bias datasets and lexicons for automating the detection of gender bias. We provide labeled datasets and exhaustive lexicons by collecting, annotating, and augmenting relevant sentences. The released datasets and lexicons span multiple bias subtypes including: Generic He, Generic She, Explicit Marking of Sex, and Gendered Neologisms.
Score: 0.30458514384586394
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Language has a profound impact on our thoughts, perceptions, and conceptions of gender roles. Gender-inclusive language is, therefore, a key tool to promote social inclusion and contribute to achieving gender equality. Consequently, detecting and mitigating gender bias in texts is instrumental in halting its propagation and societal implications. However, there is a lack of gender bias datasets and lexicons for automating the detection of gender bias using supervised and unsupervised machine learning (ML) and natural language processing (NLP) techniques. Therefore, the main contribution of this work is to publicly provide labeled datasets and exhaustive lexicons by collecting, annotating, and augmenting relevant sentences to facilitate the detection of gender bias in English text. Towards this end, we present an updated version of our previously proposed taxonomy by re-formalizing its structure, adding a new bias type, and mapping each bias subtype to an appropriate detection methodology. The released datasets and lexicons span multiple bias subtypes including: Generic He, Generic She, Explicit Marking of Sex, and Gendered Neologisms. We leveraged the use of word embedding models to further augment the collected lexicons. The underlying motivation of our work is to enable the technical community to combat gender bias in text and halt its propagation using ML and NLP techniques.

Related papers

Exploring Gender Bias Beyond Occupational Titles [1.2123876307427102]
We introduce a novel dataset, GenderLexicon, and a framework that can estimate contextual bias and its related gender bias.<n>Our model can interpret the bias with a score and thus improve the explainability of gender bias.
arXiv Detail & Related papers (2025-07-03T14:42:03Z)
Identifying Gender Stereotypes and Biases in Automated Translation from English to Italian using Similarity Networks [0.25049267048783647]
This paper is a collaborative effort between Linguistics, Law, and Computer Science to evaluate stereotypes and biases in automated translation systems. We advocate gender-neutral translation as a means to promote gender inclusion and improve the objectivity of machine translation.
arXiv Detail & Related papers (2025-02-17T09:55:32Z)
Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders. This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words) We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z)
Leveraging Large Language Models to Measure Gender Representation Bias in Gendered Language Corpora [9.959039325564744]
Gender bias in text corpora can lead to perpetuation and amplification of societal inequalities. Existing methods to measure gender representation bias in text corpora have mainly been proposed for English. This paper introduces a novel methodology to quantitatively measure gender representation bias in Spanish corpora.
arXiv Detail & Related papers (2024-06-19T16:30:58Z)
Probing Explicit and Implicit Gender Bias through LLM Conditional Text Generation [64.79319733514266]
Large Language Models (LLMs) can generate biased and toxic responses. We propose a conditional text generation mechanism without the need for predefined gender phrases and stereotypes.
arXiv Detail & Related papers (2023-11-01T05:31:46Z)
''Fifty Shades of Bias'': Normative Ratings of Gender Bias in GPT Generated English Text [11.085070600065801]
Language serves as a powerful tool for the manifestation of societal belief systems. Gender bias is one of the most pervasive biases in our society. We create the first dataset of GPT-generated English text with normative ratings of gender bias.
arXiv Detail & Related papers (2023-10-26T14:34:06Z)
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models. We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas. We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z)
"I'm fully who I am": Towards Centering Transgender and Non-Binary Voices to Measure Biases in Open Language Generation [69.25368160338043]
Transgender and non-binary (TGNB) individuals disproportionately experience discrimination and exclusion from daily life. We assess how the social reality surrounding experienced marginalization of TGNB persons contributes to and persists within Open Language Generation. We introduce TANGO, a dataset of template-based real-world text curated from a TGNB-oriented community.
arXiv Detail & Related papers (2023-05-17T04:21:45Z)
Unmasking Contextual Stereotypes: Measuring and Mitigating BERT's Gender Bias [12.4543414590979]
Contextualized word embeddings have been replacing standard embeddings in NLP systems. We measure gender bias by studying associations between gender-denoting target words and names of professions in English and German. We show that our method of measuring bias is appropriate for languages with a rich and gender-marking, such as German.
arXiv Detail & Related papers (2020-10-27T18:06:09Z)
Investigating Gender Bias in BERT [22.066477991442003]
We analyse the gender-bias it induces in five downstream tasks related to emotion and sentiment intensity prediction. We propose an algorithm that finds fine-grained gender directions, i.e., one primary direction for each BERT layer. Experiments show that removing embedding components in such directions achieves great success in reducing BERT-induced bias in the downstream tasks.
arXiv Detail & Related papers (2020-09-10T17:38:32Z)
Gender Stereotype Reinforcement: Measuring the Gender Bias Conveyed by Ranking Algorithms [68.85295025020942]
We propose the Gender Stereotype Reinforcement (GSR) measure, which quantifies the tendency of a Search Engines to support gender stereotypes. GSR is the first specifically tailored measure for Information Retrieval, capable of quantifying representational harms.
arXiv Detail & Related papers (2020-09-02T20:45:04Z)
Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text. We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions. Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.