Related papers: The Lou Dataset -- Exploring the Impact of Gender-Fair Language in German Text Classification

The Lou Dataset -- Exploring the Impact of Gender-Fair Language in German Text Classification

URL: http://arxiv.org/abs/2409.17929v1
Date: Thu, 26 Sep 2024 15:08:17 GMT
Title: The Lou Dataset -- Exploring the Impact of Gender-Fair Language in German Text Classification
Authors: Andreas Waldis, Joel Birrer, Anne Lauscher, Iryna Gurevych,
Abstract summary: Gender-fair language fosters inclusion by addressing all genders or using neutral forms. Gender-fair language substantially impacts predictions by flipping labels, reducing certainty, and altering attention patterns. While we offer initial insights on the effect on German text classification, the findings likely apply to other languages.
Score: 57.06913662622832
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Gender-fair language, an evolving German linguistic variation, fosters inclusion by addressing all genders or using neutral forms. Nevertheless, there is a significant lack of resources to assess the impact of this linguistic shift on classification using language models (LMs), which are probably not trained on such variations. To address this gap, we present Lou, the first dataset featuring high-quality reformulations for German text classification covering seven tasks, like stance detection and toxicity classification. Evaluating 16 mono- and multi-lingual LMs on Lou shows that gender-fair language substantially impacts predictions by flipping labels, reducing certainty, and altering attention patterns. However, existing evaluations remain valid, as LM rankings of original and reformulated instances do not significantly differ. While we offer initial insights on the effect on German text classification, the findings likely apply to other languages, as consistent patterns were observed in multi-lingual and English LMs.

Related papers

Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders. This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words) We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z)
Leveraging Large Language Models to Measure Gender Representation Bias in Gendered Language Corpora [9.959039325564744]
Gender bias in text corpora can lead to perpetuation and amplification of societal inequalities. Existing methods to measure gender representation bias in text corpora have mainly been proposed for English. This paper introduces a novel methodology to quantitatively measure gender representation bias in Spanish corpora.
arXiv Detail & Related papers (2024-06-19T16:30:58Z)
Investigating Markers and Drivers of Gender Bias in Machine Translations [0.0]
Implicit gender bias in large language models (LLMs) is a well-documented problem. We use the DeepL translation API to investigate the bias evinced when repeatedly translating a set of 56 Software Engineering tasks. We find that some languages display similar patterns of pronoun use, falling into three loose groups. We identify the main verb appearing in a sentence as a likely significant driver of implied gender in the translations.
arXiv Detail & Related papers (2024-03-18T15:54:46Z)
Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps [25.95711246919163]
Current automatic speech recognition (ASR) models are designed to be used across many languages and tasks without substantial changes. Our study systematically evaluates the performance of two widely used multilingual ASR models on three datasets. Our findings reveal clear gender disparities, with the advantaged group varying across languages and models.
arXiv Detail & Related papers (2024-02-28T00:24:29Z)
Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z)
Easy Adaptation to Mitigate Gender Bias in Multilingual Text Classification [8.137681060429527]
We treat the gender as domains and present a standard domain adaptation model to reduce the gender bias. We evaluate our approach on two text classification tasks, hate speech detection and rating prediction, and demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-04-12T01:15:36Z)
Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias in Speech Translation [20.39599469927542]
Gender bias is largely recognized as a problematic phenomenon affecting language technologies. Most of current evaluation practices adopt a word-level focus on a narrow set of occupational nouns under synthetic conditions. Such protocols overlook key features of grammatical gender languages, which are characterized by morphosyntactic chains of gender agreement.
arXiv Detail & Related papers (2022-03-18T11:14:16Z)
AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context. It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts. Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z)
Inducing Language-Agnostic Multilingual Representations [61.97381112847459]
Cross-lingual representations have the potential to make NLP techniques available to the vast majority of languages in the world. We examine three approaches for this: (i) re-aligning the vector spaces of target languages to a pivot source language; (ii) removing language-specific means and variances, which yields better discriminativeness of embeddings as a by-product; and (iii) increasing input similarity across languages by removing morphological contractions and sentence reordering.
arXiv Detail & Related papers (2020-08-20T17:58:56Z)
Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications. We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.