INCLUSIFY: A benchmark and a model for gender-inclusive German
- URL: http://arxiv.org/abs/2212.02564v1
- Date: Mon, 5 Dec 2022 19:37:48 GMT
- Title: INCLUSIFY: A benchmark and a model for gender-inclusive German
- Authors: David Pomerenke
- Abstract summary: Gender-inclusive language is important for achieving gender equality in languages with gender inflections.
A handful of tools have been developed to help people use gender-inclusive language.
We present a dataset and measures for benchmarking them, and present a model that implements these tasks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Gender-inclusive language is important for achieving gender equality in
languages with gender inflections, such as German. While stirring some
controversy, it is increasingly adopted by companies and political
institutions. A handful of tools have been developed to help people use
gender-inclusive language by identifying instances of the generic masculine and
providing suggestions for more inclusive reformulations. In this report, we
define the underlying tasks in terms of natural language processing, and
present a dataset and measures for benchmarking them. We also present a model
that implements these tasks, by combining an inclusive language database with
an elaborate sequence of processing steps via standard pre-trained models. Our
model achieves a recall of 0.89 and a precision of 0.82 in our benchmark for
identifying exclusive language; and one of its top five suggestions is chosen
in real-world texts in 44% of cases. We sketch how the area could be further
advanced by training end-to-end models and using large language models; and we
urge the community to include more gender-inclusive texts in their training
data in order to not present an obstacle to the adoption of gender-inclusive
language. Through these efforts, we hope to contribute to restoring justice in
language and, to a small extent, in reality.
Related papers
- GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases.
GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z) - From 'Showgirls' to 'Performers': Fine-tuning with Gender-inclusive Language for Bias Reduction in LLMs [1.1049608786515839]
We adapt linguistic structures within Large Language Models to promote gender-inclusivity.
The focus of our work is gender-exclusive affixes in English, such as in'show-girl' or'man-cave'
arXiv Detail & Related papers (2024-07-05T11:31:30Z) - Leveraging Large Language Models to Measure Gender Representation Bias in Gendered Language Corpora [9.959039325564744]
Gender bias in text corpora can lead to perpetuation and amplification of societal inequalities.
Existing methods to measure gender representation bias in text corpora have mainly been proposed for English.
This paper introduces a novel methodology to quantitatively measure gender representation bias in Spanish corpora.
arXiv Detail & Related papers (2024-06-19T16:30:58Z) - Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in
Multilingual Machine Translation [28.471506840241602]
Gender bias is a significant issue in machine translation, leading to ongoing research efforts in developing bias mitigation techniques.
We propose a bias mitigation method based on a novel approach.
Gender-Aware Contrastive Learning, GACL, encodes contextual gender information into the representations of non-explicit gender words.
arXiv Detail & Related papers (2023-05-23T12:53:39Z) - Measuring Normative and Descriptive Biases in Language Models Using
Census Data [6.445605125467574]
We investigate how occupations with respect to gender is reflected in pre-trained language models.
We introduce an approach for measuring to what degree pre-trained language models are aligned to normative and descriptive occupational distributions.
arXiv Detail & Related papers (2023-04-12T11:06:14Z) - Analyzing Gender Representation in Multilingual Models [59.21915055702203]
We focus on the representation of gender distinctions as a practical case study.
We examine the extent to which the gender concept is encoded in shared subspaces across different languages.
arXiv Detail & Related papers (2022-04-20T00:13:01Z) - Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases.
We propose steps towards mitigating social biases during text generation.
Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z) - They, Them, Theirs: Rewriting with Gender-Neutral English [56.14842450974887]
We perform a case study on the singular they, a common way to promote gender inclusion in English.
We show how a model can be trained to produce gender-neutral English with 1% word error rate with no human-labeled data.
arXiv Detail & Related papers (2021-02-12T21:47:48Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.