Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias
in Speech Translation
- URL: http://arxiv.org/abs/2203.09866v1
- Date: Fri, 18 Mar 2022 11:14:16 GMT
- Title: Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias
in Speech Translation
- Authors: Beatrice Savoldi, Marco Gaido, Luisa Bentivogli, Matteo Negri, Marco
Turchi
- Abstract summary: Gender bias is largely recognized as a problematic phenomenon affecting language technologies.
Most of current evaluation practices adopt a word-level focus on a narrow set of occupational nouns under synthetic conditions.
Such protocols overlook key features of grammatical gender languages, which are characterized by morphosyntactic chains of gender agreement.
- Score: 20.39599469927542
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Gender bias is largely recognized as a problematic phenomenon affecting
language technologies, with recent studies underscoring that it might surface
differently across languages. However, most of current evaluation practices
adopt a word-level focus on a narrow set of occupational nouns under synthetic
conditions. Such protocols overlook key features of grammatical gender
languages, which are characterized by morphosyntactic chains of gender
agreement, marked on a variety of lexical items and parts-of-speech (POS). To
overcome this limitation, we enrich the natural, gender-sensitive MuST-SHE
corpus (Bentivogli et al., 2020) with two new linguistic annotation layers (POS
and agreement chains), and explore to what extent different lexical categories
and agreement phenomena are impacted by gender skews. Focusing on speech
translation, we conduct a multifaceted evaluation on three language directions
(English-French/Italian/Spanish), with models trained on varying amounts of
data and different word segmentation techniques. By shedding light on model
behaviours, gender bias, and its detection at several levels of granularity,
our findings emphasize the value of dedicated analyses beyond aggregated
overall results.
Related papers
- The Lou Dataset -- Exploring the Impact of Gender-Fair Language in German Text Classification [57.06913662622832]
Gender-fair language fosters inclusion by addressing all genders or using neutral forms.
Gender-fair language substantially impacts predictions by flipping labels, reducing certainty, and altering attention patterns.
While we offer initial insights on the effect on German text classification, the findings likely apply to other languages.
arXiv Detail & Related papers (2024-09-26T15:08:17Z) - Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - Leveraging Large Language Models to Measure Gender Bias in Gendered Languages [9.959039325564744]
This paper introduces a novel methodology that leverages the contextual understanding capabilities of large language models (LLMs) to quantitatively analyze gender representation in Spanish corpora.
We empirically validate our method on four widely-used benchmark datasets, uncovering significant gender disparities with a male-to-female ratio ranging from 4:01.
arXiv Detail & Related papers (2024-06-19T16:30:58Z) - Analyzing Gender Representation in Multilingual Models [59.21915055702203]
We focus on the representation of gender distinctions as a practical case study.
We examine the extent to which the gender concept is encoded in shared subspaces across different languages.
arXiv Detail & Related papers (2022-04-20T00:13:01Z) - Easy Adaptation to Mitigate Gender Bias in Multilingual Text
Classification [8.137681060429527]
We treat the gender as domains and present a standard domain adaptation model to reduce the gender bias.
We evaluate our approach on two text classification tasks, hate speech detection and rating prediction, and demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2022-04-12T01:15:36Z) - A Massively Multilingual Analysis of Cross-linguality in Shared
Embedding Space [61.18554842370824]
In cross-lingual language models, representations for many different languages live in the same space.
We compute a task-based measure of cross-lingual alignment in the form of bitext retrieval performance.
We examine a range of linguistic, quasi-linguistic, and training-related features as potential predictors of these alignment metrics.
arXiv Detail & Related papers (2021-09-13T21:05:37Z) - AM2iCo: Evaluating Word Meaning in Context across Low-ResourceLanguages
with Adversarial Examples [51.048234591165155]
We present AM2iCo, Adversarial and Multilingual Meaning in Context.
It aims to faithfully assess the ability of state-of-the-art (SotA) representation models to understand the identity of word meaning in cross-lingual contexts.
Results reveal that current SotA pretrained encoders substantially lag behind human performance.
arXiv Detail & Related papers (2021-04-17T20:23:45Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.