Data Augmentation Methods for Anaphoric Zero Pronouns
- URL: http://arxiv.org/abs/2109.09825v1
- Date: Mon, 20 Sep 2021 20:16:01 GMT
- Title: Data Augmentation Methods for Anaphoric Zero Pronouns
- Authors: Abdulrahman Aloraini and Massimo Poesio
- Abstract summary: We use five data augmentation methods to generate and detect anaphoric zero pronouns automatically.
We use the augmented data as additional training materials for two anaphoric zero pronoun systems for Arabic.
- Score: 8.732165992971545
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In pro-drop language like Arabic, Chinese, Italian, Japanese, Spanish, and
many others, unrealized (null) arguments in certain syntactic positions can
refer to a previously introduced entity, and are thus called anaphoric zero
pronouns. The existing resources for studying anaphoric zero pronoun
interpretation are however still limited. In this paper, we use five data
augmentation methods to generate and detect anaphoric zero pronouns
automatically. We use the augmented data as additional training materials for
two anaphoric zero pronoun systems for Arabic. Our experimental results show
that data augmentation improves the performance of the two systems, surpassing
the state-of-the-art results.
Related papers
- Mention Attention for Pronoun Translation [5.896961355859321]
We introduce an additional mention attention module in the decoder to pay extra attention to source mentions but not non-mention tokens.
Our mention attention module not only extracts features from source mentions, but also considers target-side context which benefits pronoun translation.
We conduct experiments on the WMT17 English-German translation task, and evaluate our models on general translation and pronoun translation.
arXiv Detail & Related papers (2024-12-19T13:19:19Z) - Mitigating Bias in Queer Representation within Large Language Models: A Collaborative Agent Approach [0.0]
Large Language Models (LLMs) often perpetuate biases in pronoun usage, leading to misrepresentation or exclusion of queer individuals.
This paper addresses the specific problem of biased pronoun usage in LLM outputs, particularly the inappropriate use of traditionally gendered pronouns.
We introduce a collaborative agent pipeline designed to mitigate these biases by analyzing and optimizing pronoun usage for inclusivity.
arXiv Detail & Related papers (2024-11-12T09:14:16Z) - Transforming Dutch: Debiasing Dutch Coreference Resolution Systems for Non-binary Pronouns [5.5514102920271196]
Gender-neutral pronouns are increasingly being introduced across Western languages.
Recent evaluations have demonstrated that English NLP systems are unable to correctly process gender-neutral pronouns.
This paper examines a Dutch coreference resolution system's performance on gender-neutral pronouns.
arXiv Detail & Related papers (2024-04-30T18:31:19Z) - Tokenization Matters: Navigating Data-Scarce Tokenization for Gender Inclusive Language Technologies [75.85462924188076]
Gender-inclusive NLP research has documented the harmful limitations of gender binary-centric large language models (LLM)
We find that misgendering is significantly influenced by Byte-Pair (BPE) tokenization.
We propose two techniques: (1) pronoun tokenization parity, a method to enforce consistent tokenization across gendered pronouns, and (2) utilizing pre-existing LLM pronoun knowledge to improve neopronoun proficiency.
arXiv Detail & Related papers (2023-12-19T01:28:46Z) - A Survey on Zero Pronoun Translation [69.09774294082965]
Zero pronouns (ZPs) are frequently omitted in pro-drop languages, but should be recalled in non-pro-drop languages.
This survey paper highlights the major works that have been undertaken in zero pronoun translation (ZPT) after the neural revolution.
We uncover a number of insightful findings such as: 1) ZPT is in line with the development trend of large language model; 2) data limitation causes learning bias in languages and domains; 3) performance improvements are often reported on single benchmarks, but advanced methods are still far from real-world use.
arXiv Detail & Related papers (2023-05-17T13:19:01Z) - Speech-to-Speech Translation For A Real-world Unwritten Language [62.414304258701804]
We study speech-to-speech translation (S2ST) that translates speech from one language into another language.
We present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
arXiv Detail & Related papers (2022-11-11T20:21:38Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - Zero-pronoun Data Augmentation for Japanese-to-English Translation [15.716533830931764]
We propose a data augmentation method that provides additional training signals for the translation model to learn correlations between local context and zero pronouns.
We show that the proposed method significantly improves the accuracy of zero pronoun translation with machine translation experiments in the conversational domain.
arXiv Detail & Related papers (2021-07-01T09:17:59Z) - On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
Language coverage bias is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice.
By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data.
We propose two simple and effective approaches to alleviate the language coverage bias problem.
arXiv Detail & Related papers (2021-06-07T01:55:34Z) - Scalable Cross Lingual Pivots to Model Pronoun Gender for Translation [4.775445987662862]
Machine translation systems with inadequate document understanding can make errors when translating dropped or neutral pronouns into languages with gendered pronouns.
We propose a novel cross-lingual pivoting technique for automatically producing high-quality gender labels.
arXiv Detail & Related papers (2020-06-16T02:41:46Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.