Related papers: How to Split: the Effect of Word Segmentation on Gender Bias in Speech Translation

How to Split: the Effect of Word Segmentation on Gender Bias in Speech Translation

URL: http://arxiv.org/abs/2105.13782v1
Date: Fri, 28 May 2021 12:38:21 GMT
Title: How to Split: the Effect of Word Segmentation on Gender Bias in Speech Translation
Authors: Marco Gaido, Beatrice Savoldi, Luisa Bentivogli, Matteo Negri, Marco Turchi
Abstract summary: We bring the analysis on gender bias in automatic translation onto a seemingly neutral yet critical component: word segmentation. Our results on two language pairs (English-Italian/French) show that state-of-the-art sub-word splitting (BPE) comes at the cost of higher gender bias. In light of this finding, we propose a combined approach that preserves BPE overall translation quality, while leveraging the higher ability of character-based segmentation to properly translate gender.
Score: 14.955696163410254
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Having recognized gender bias as a major issue affecting current translation technologies, researchers have primarily attempted to mitigate it by working on the data front. However, whether algorithmic aspects concur to exacerbate unwanted outputs remains so far under-investigated. In this work, we bring the analysis on gender bias in automatic translation onto a seemingly neutral yet critical component: word segmentation. Can segmenting methods influence the ability to translate gender? Do certain segmentation approaches penalize the representation of feminine linguistic markings? We address these questions by comparing 5 existing segmentation strategies on the target side of speech translation systems. Our results on two language pairs (English-Italian/French) show that state-of-the-art sub-word splitting (BPE) comes at the cost of higher gender bias. In light of this finding, we propose a combined approach that preserves BPE overall translation quality, while leveraging the higher ability of character-based segmentation to properly translate gender.

Related papers

Identifying Gender Stereotypes and Biases in Automated Translation from English to Italian using Similarity Networks [0.25049267048783647]
This paper is a collaborative effort between Linguistics, Law, and Computer Science to evaluate stereotypes and biases in automated translation systems. We advocate gender-neutral translation as a means to promote gender inclusion and improve the objectivity of machine translation.
arXiv Detail & Related papers (2025-02-17T09:55:32Z)
The Lou Dataset -- Exploring the Impact of Gender-Fair Language in German Text Classification [57.06913662622832]
Gender-fair language fosters inclusion by addressing all genders or using neutral forms. Gender-fair language substantially impacts predictions by flipping labels, reducing certainty, and altering attention patterns. While we offer initial insights on the effect on German text classification, the findings likely apply to other languages.
arXiv Detail & Related papers (2024-09-26T15:08:17Z)
Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders. This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words) We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z)
Investigating Markers and Drivers of Gender Bias in Machine Translations [0.0]
Implicit gender bias in large language models (LLMs) is a well-documented problem. We use the DeepL translation API to investigate the bias evinced when repeatedly translating a set of 56 Software Engineering tasks. We find that some languages display similar patterns of pronoun use, falling into three loose groups. We identify the main verb appearing in a sentence as a likely significant driver of implied gender in the translations.
arXiv Detail & Related papers (2024-03-18T15:54:46Z)
Evaluating Gender Bias in the Translation of Gender-Neutral Languages into English [0.0]
We introduce GATE X-E, an extension to the GATE corpus, that consists of human translations from Turkish, Hungarian, Finnish, and Persian into English. The dataset features natural sentences with a wide range of sentence lengths and domains, challenging translation rewriters on various linguistic phenomena. We present an English gender rewriting solution built on GPT-3.5 Turbo and use GATE X-E to evaluate it.
arXiv Detail & Related papers (2023-11-15T10:25:14Z)
Exploring the Impact of Training Data Distribution and Subword Tokenization on Gender Bias in Machine Translation [19.719314005149883]
We study the effect of tokenization on gender bias in machine translation. We observe that female and non-stereotypical gender inflections of profession names tend to be split into subword tokens. We show that analyzing subword splits provides good estimates of gender-form imbalance in the training data.
arXiv Detail & Related papers (2023-09-21T21:21:55Z)
The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender Characterisation in 55 Languages [51.2321117760104]
This paper describes the Gender-GAP Pipeline, an automatic pipeline to characterize gender representation in large-scale datasets for 55 languages. The pipeline uses a multilingual lexicon of gendered person-nouns to quantify the gender representation in text. We showcase it to report gender representation in WMT training data and development data for the News task, confirming that current data is skewed towards masculine representation.
arXiv Detail & Related papers (2023-08-31T17:20:50Z)
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models. We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas. We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z)
Gender Lost In Translation: How Bridging The Gap Between Languages Affects Gender Bias in Zero-Shot Multilingual Translation [12.376309678270275]
bridging the gap between languages for which parallel data is not available affects gender bias in multilingual NMT. We study the effect of encouraging language-agnostic hidden representations on models' ability to preserve gender. We find that language-agnostic representations mitigate zero-shot models' masculine bias, and with increased levels of gender inflection in the bridge language, pivoting surpasses zero-shot translation regarding fairer gender preservation for speaker-related gender agreement.
arXiv Detail & Related papers (2023-05-26T13:51:50Z)
Double-Hard Debias: Tailoring Word Embeddings for Gender Bias Mitigation [94.98656228690233]
We propose a technique that purifies the word embeddings against corpus regularities prior to inferring and removing the gender subspace. Our approach preserves the distributional semantics of the pre-trained word embeddings while reducing gender bias to a significantly larger degree than prior approaches.
arXiv Detail & Related papers (2020-05-03T02:33:20Z)
Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text. We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions. Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.