MiTTenS: A Dataset for Evaluating Misgendering in Translation
- URL: http://arxiv.org/abs/2401.06935v1
- Date: Sat, 13 Jan 2024 00:08:23 GMT
- Title: MiTTenS: A Dataset for Evaluating Misgendering in Translation
- Authors: Kevin Robinson, Sneha Kudugunta, Romina Stella, Sunipa Dev, Jasmijn
Bastings
- Abstract summary: Misgendering is the act of referring to someone in a way that does not reflect their gender identity.
We introduce a dataset, MiTTenS, covering 26 languages from a variety of language families and scripts.
- Score: 16.446952262028358
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Misgendering is the act of referring to someone in a way that does not
reflect their gender identity. Translation systems, including foundation models
capable of translation, can produce errors that result in misgendering harms.
To measure the extent of such potential harms when translating into and out of
English, we introduce a dataset, MiTTenS, covering 26 languages from a variety
of language families and scripts, including several traditionally
underpresented in digital resources. The dataset is constructed with
handcrafted passages that target known failure patterns, longer synthetically
generated passages, and natural passages sourced from multiple domains. We
demonstrate the usefulness of the dataset by evaluating both dedicated neural
machine translation systems and foundation models, and show that all systems
exhibit errors resulting in misgendering harms, even in high resource
languages.
Related papers
- The Lou Dataset -- Exploring the Impact of Gender-Fair Language in German Text Classification [57.06913662622832]
Gender-fair language fosters inclusion by addressing all genders or using neutral forms.
Gender-fair language substantially impacts predictions by flipping labels, reducing certainty, and altering attention patterns.
While we offer initial insights on the effect on German text classification, the findings likely apply to other languages.
arXiv Detail & Related papers (2024-09-26T15:08:17Z) - A Data Selection Approach for Enhancing Low Resource Machine Translation Using Cross-Lingual Sentence Representations [0.4499833362998489]
This study focuses on the case of English-Marathi language pairs, where existing datasets are notably noisy.
To mitigate the impact of data quality issues, we propose a data filtering approach based on cross-lingual sentence representations.
Results demonstrate a significant improvement in translation quality over the baseline post-filtering with IndicSBERT.
arXiv Detail & Related papers (2024-09-04T13:49:45Z) - Reducing Gender Bias in Machine Translation through Counterfactual Data
Generation [0.0]
We show that gender bias can be significantly mitigated, albeit at the expense of translation quality due to catastrophic forgetting.
We also propose a novel domain-adaptation technique that leverages in-domain data created with the counterfactual data generation techniques.
The relevant code will be available at Github.
arXiv Detail & Related papers (2023-11-27T23:03:01Z) - Target-Agnostic Gender-Aware Contrastive Learning for Mitigating Bias in
Multilingual Machine Translation [28.471506840241602]
Gender bias is a significant issue in machine translation, leading to ongoing research efforts in developing bias mitigation techniques.
We propose a bias mitigation method based on a novel approach.
Gender-Aware Contrastive Learning, GACL, encodes contextual gender information into the representations of non-explicit gender words.
arXiv Detail & Related papers (2023-05-23T12:53:39Z) - Mitigating Data Imbalance and Representation Degeneration in
Multilingual Machine Translation [103.90963418039473]
Bi-ACL is a framework that uses only target-side monolingual data and a bilingual dictionary to improve the performance of the MNMT model.
We show that Bi-ACL is more effective both in long-tail languages and in high-resource languages.
arXiv Detail & Related papers (2023-05-22T07:31:08Z) - MultiTACRED: A Multilingual Version of the TAC Relation Extraction
Dataset [6.7839993945546215]
We introduce the MultiTACRED dataset, covering 12 typologically diverse languages from 9 language families.
We analyze translation and annotation projection quality, identify error categories, and experimentally evaluate fine-tuned pretrained mono- and multilingual language models.
We find monolingual RE model performance to be comparable to the English original for many of the target languages, and that multilingual models trained on a combination of English and target language data can outperform their monolingual counterparts.
arXiv Detail & Related papers (2023-05-08T09:48:21Z) - Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models.
We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks.
OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z) - Building Machine Translation Systems for the Next Thousand Languages [102.24310122155073]
We describe results in three research domains: building clean, web-mined datasets for 1500+ languages, developing practical MT models for under-served languages, and studying the limitations of evaluation metrics for these languages.
We hope that our work provides useful insights to practitioners working towards building MT systems for currently understudied languages, and highlights research directions that can complement the weaknesses of massively multilingual models in data-sparse settings.
arXiv Detail & Related papers (2022-05-09T00:24:13Z) - Investigating Failures of Automatic Translation in the Case of
Unambiguous Gender [13.58884863186619]
Transformer based models are the modern work horses for neural machine translation (NMT)
We observe a systemic and rudimentary class of errors made by transformer based models with regards to translating from a language that doesn't mark gender on nouns into others that do.
We release an evaluation scheme and dataset for measuring the ability of transformer based NMT models to translate gender correctly.
arXiv Detail & Related papers (2021-04-16T00:57:36Z) - Curious Case of Language Generation Evaluation Metrics: A Cautionary
Tale [52.663117551150954]
A few popular metrics remain as the de facto metrics to evaluate tasks such as image captioning and machine translation.
This is partly due to ease of use, and partly because researchers expect to see them and know how to interpret them.
In this paper, we urge the community for more careful consideration of how they automatically evaluate their models.
arXiv Detail & Related papers (2020-10-26T13:57:20Z) - XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning [68.57658225995966]
Cross-lingual Choice of Plausible Alternatives (XCOPA) is a typologically diverse multilingual dataset for causal commonsense reasoning in 11 languages.
We evaluate a range of state-of-the-art models on this novel dataset, revealing that the performance of current methods falls short compared to translation-based transfer.
arXiv Detail & Related papers (2020-05-01T12:22:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.