GENder-IT: An Annotated English-Italian Parallel Challenge Set for
Cross-Linguistic Natural Gender Phenomena
- URL: http://arxiv.org/abs/2108.02854v1
- Date: Thu, 5 Aug 2021 21:08:45 GMT
- Title: GENder-IT: An Annotated English-Italian Parallel Challenge Set for
Cross-Linguistic Natural Gender Phenomena
- Authors: Eva Vanmassenhove, Johanna Monti
- Abstract summary: gENder-IT is an English--Italian challenge set focusing on the resolution of natural gender phenomena.
It provides word-level gender tags on the English source side and multiple gender alternative, where translations needed, on the Italian target side.
- Score: 2.4366811507669124
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Languages differ in terms of the absence or presence of gender features, the
number of gender classes and whether and where gender features are explicitly
marked. These cross-linguistic differences can lead to ambiguities that are
difficult to resolve, especially for sentence-level MT systems. The
identification of ambiguity and its subsequent resolution is a challenging task
for which currently there aren't any specific resources or challenge sets
available. In this paper, we introduce gENder-IT, an English--Italian challenge
set focusing on the resolution of natural gender phenomena by providing
word-level gender tags on the English source side and multiple gender
alternative translations, where needed, on the Italian target side.
Related papers
- mGeNTE: A Multilingual Resource for Gender-Neutral Language and Translation [21.461095625903504]
mGeNTE is a dataset of English-Italian/German/Spanish language pairs.
It enables research in both automatic Gender-Neutral Translation (GNT) and language modelling for three grammatical gender languages.
arXiv Detail & Related papers (2025-01-16T09:35:15Z) - GFG -- Gender-Fair Generation: A CALAMITA Challenge [15.399739689743935]
Gender-fair language aims at promoting gender equality by using terms and expressions that include all identities.
Gender-Fair Generation challenge intends to help shift toward gender-fair language in written communication.
arXiv Detail & Related papers (2024-12-26T10:58:40Z) - Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders.
This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words)
We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z) - What an Elegant Bridge: Multilingual LLMs are Biased Similarly in Different Languages [51.0349882045866]
This paper investigates biases of Large Language Models (LLMs) through the lens of grammatical gender.
We prompt a model to describe nouns with adjectives in various languages, focusing specifically on languages with grammatical gender.
We find that a simple classifier can not only predict noun gender above chance but also exhibit cross-language transferability.
arXiv Detail & Related papers (2024-07-12T22:10:16Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - Gender, names and other mysteries: Towards the ambiguous for
gender-inclusive translation [7.322734499960981]
This paper explores the case where the source sentence lacks explicit gender markers, but the target sentence contains them due to richer grammatical gender.
We find that many name-gender co-occurrences in MT data are not resolvable with 'unambiguous gender' in the source language.
We discuss potential steps toward gender-inclusive translation which accepts the ambiguity in both gender and translation.
arXiv Detail & Related papers (2023-06-07T16:21:59Z) - Gender Neutralization for an Inclusive Machine Translation: from
Theoretical Foundations to Open Challenges [11.37307883423629]
We explore gender-neutral translation (GNT) as a form of gender inclusivity and a goal to be achieved by machine translation (MT) models.
Specifically, we focus on translation from English into Italian, a language pair representative of salient gender-related linguistic transfer problems.
arXiv Detail & Related papers (2023-01-24T15:26:36Z) - Analyzing Gender Representation in Multilingual Models [59.21915055702203]
We focus on the representation of gender distinctions as a practical case study.
We examine the extent to which the gender concept is encoded in shared subspaces across different languages.
arXiv Detail & Related papers (2022-04-20T00:13:01Z) - Under the Morphosyntactic Lens: A Multifaceted Evaluation of Gender Bias
in Speech Translation [20.39599469927542]
Gender bias is largely recognized as a problematic phenomenon affecting language technologies.
Most of current evaluation practices adopt a word-level focus on a narrow set of occupational nouns under synthetic conditions.
Such protocols overlook key features of grammatical gender languages, which are characterized by morphosyntactic chains of gender agreement.
arXiv Detail & Related papers (2022-03-18T11:14:16Z) - They, Them, Theirs: Rewriting with Gender-Neutral English [56.14842450974887]
We perform a case study on the singular they, a common way to promote gender inclusion in English.
We show how a model can be trained to produce gender-neutral English with 1% word error rate with no human-labeled data.
arXiv Detail & Related papers (2021-02-12T21:47:48Z) - Neural Machine Translation Doesn't Translate Gender Coreference Right
Unless You Make It [18.148675498274866]
We propose schemes for incorporating explicit word-level gender inflection tags into Neural Machine Translation.
We find that simple existing approaches can over-generalize a gender-feature to multiple entities in a sentence.
We also propose an extension to assess translations of gender-neutral entities from English given a corresponding linguistic convention.
arXiv Detail & Related papers (2020-10-11T20:05:42Z) - Type B Reflexivization as an Unambiguous Testbed for Multilingual
Multi-Task Gender Bias [5.239305978984572]
We show that for languages with type B reflexivization, we can construct multi-task challenge datasets for detecting gender bias.
In these languages, the direct translation of 'the doctor removed his mask' is not ambiguous between a coreferential reading and a disjoint reading.
We present a multilingual, multi-task challenge dataset, which spans four languages and four NLP tasks.
arXiv Detail & Related papers (2020-09-24T23:47:18Z) - Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text.
We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions.
Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.