Related papers: Scalable Cross Lingual Pivots to Model Pronoun Gender for Translation

Scalable Cross Lingual Pivots to Model Pronoun Gender for Translation

URL: http://arxiv.org/abs/2006.08881v1
Date: Tue, 16 Jun 2020 02:41:46 GMT
Title: Scalable Cross Lingual Pivots to Model Pronoun Gender for Translation
Authors: Kellie Webster and Emily Pitler
Abstract summary: Machine translation systems with inadequate document understanding can make errors when translating dropped or neutral pronouns into languages with gendered pronouns. We propose a novel cross-lingual pivoting technique for automatically producing high-quality gender labels.
Score: 4.775445987662862
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Machine translation systems with inadequate document understanding can make errors when translating dropped or neutral pronouns into languages with gendered pronouns (e.g., English). Predicting the underlying gender of these pronouns is difficult since it is not marked textually and must instead be inferred from coreferent mentions in the context. We propose a novel cross-lingual pivoting technique for automatically producing high-quality gender labels, and show that this data can be used to fine-tune a BERT classifier with 92% F1 for Spanish dropped feminine pronouns, compared with 30-51% for neural machine translation models and 54-71% for a non-fine-tuned BERT model. We augment a neural machine translation model with labels from our classifier to improve pronoun translation, while still having parallelizable translation models that translate a sentence at a time.

Related papers

Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words [85.48043537327258]
Existing machine translation gender bias evaluations are primarily focused on male and female genders. This study presents a benchmark AmbGIMT (Gender-Inclusive Machine Translation with Ambiguous attitude words) We propose a novel process to evaluate gender bias based on the Emotional Attitude Score (EAS), which is used to quantify ambiguous attitude words.
arXiv Detail & Related papers (2024-07-23T08:13:51Z)
Tokenization Matters: Navigating Data-Scarce Tokenization for Gender Inclusive Language Technologies [75.85462924188076]
Gender-inclusive NLP research has documented the harmful limitations of gender binary-centric large language models (LLM) We find that misgendering is significantly influenced by Byte-Pair (BPE) tokenization. We propose two techniques: (1) pronoun tokenization parity, a method to enforce consistent tokenization across gendered pronouns, and (2) utilizing pre-existing LLM pronoun knowledge to improve neopronoun proficiency.
arXiv Detail & Related papers (2023-12-19T01:28:46Z)
A Tale of Pronouns: Interpretability Informs Gender Bias Mitigation for Fairer Instruction-Tuned Machine Translation [35.44115368160656]
We investigate whether and to what extent machine translation models exhibit gender bias. We find that IFT models default to male-inflected translations, even disregarding female occupational stereotypes. We propose an easy-to-implement and effective bias mitigation solution.
arXiv Detail & Related papers (2023-10-18T17:36:55Z)
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models. We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas. We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z)
MISGENDERED: Limits of Large Language Models in Understanding Pronouns [46.276320374441056]
We evaluate popular language models for their ability to correctly use English gender-neutral pronouns. We introduce MISGENDERED, a framework for evaluating large language models' ability to correctly use preferred pronouns.
arXiv Detail & Related papers (2023-06-06T18:27:52Z)
Generating Gender Augmented Data for NLP [3.5557219875516655]
Gender bias is a frequent occurrence in NLP-based applications, especially in gender-inflected languages. This paper proposes an automatic and generalisable rewriting approach for short conversational sentences. The proposed approach is based on a neural machine translation (NMT) system trained to 'translate' from one gender alternative to another.
arXiv Detail & Related papers (2021-07-13T11:13:21Z)
Investigating Failures of Automatic Translation in the Case of Unambiguous Gender [13.58884863186619]
Transformer based models are the modern work horses for neural machine translation (NMT) We observe a systemic and rudimentary class of errors made by transformer based models with regards to translating from a language that doesn't mark gender on nouns into others that do. We release an evaluation scheme and dataset for measuring the ability of transformer based NMT models to translate gender correctly.
arXiv Detail & Related papers (2021-04-16T00:57:36Z)
Repairing Pronouns in Translation with BERT-Based Post-Editing [7.6344611819427035]
We show that in some domains, pronoun choice can account for more than half of a NMT systems' errors. We then investigate a possible solution: fine-tuning BERT on a pronoun prediction task using chunks of source-side sentences.
arXiv Detail & Related papers (2021-03-23T21:01:03Z)
Neural Machine Translation Doesn't Translate Gender Coreference Right Unless You Make It [18.148675498274866]
We propose schemes for incorporating explicit word-level gender inflection tags into Neural Machine Translation. We find that simple existing approaches can over-generalize a gender-feature to multiple entities in a sentence. We also propose an extension to assess translations of gender-neutral entities from English given a corresponding linguistic convention.
arXiv Detail & Related papers (2020-10-11T20:05:42Z)
Transformer-GCRF: Recovering Chinese Dropped Pronouns with General Conditional Random Fields [54.03719496661691]
We present a novel framework that combines the strength of Transformer network with General Conditional Random Fields (GCRF) to model the dependencies between pronouns in neighboring utterances. Results on three Chinese conversation datasets show that the Transformer-GCRF model outperforms the state-of-the-art dropped pronoun recovery models.
arXiv Detail & Related papers (2020-10-07T07:06:09Z)
Multi-Dimensional Gender Bias Classification [67.65551687580552]
Machine learning models can inadvertently learn socially undesirable patterns when training on gender biased text. We propose a general framework that decomposes gender bias in text along several pragmatic and semantic dimensions. Using this fine-grained framework, we automatically annotate eight large scale datasets with gender information.
arXiv Detail & Related papers (2020-05-01T21:23:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.