Zero-pronoun Data Augmentation for Japanese-to-English Translation
- URL: http://arxiv.org/abs/2107.00318v1
- Date: Thu, 1 Jul 2021 09:17:59 GMT
- Title: Zero-pronoun Data Augmentation for Japanese-to-English Translation
- Authors: Ryokan Ri, Toshiaki Nakazawa and Yoshimasa Tsuruoka
- Abstract summary: We propose a data augmentation method that provides additional training signals for the translation model to learn correlations between local context and zero pronouns.
We show that the proposed method significantly improves the accuracy of zero pronoun translation with machine translation experiments in the conversational domain.
- Score: 15.716533830931764
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: For Japanese-to-English translation, zero pronouns in Japanese pose a
challenge, since the model needs to infer and produce the corresponding pronoun
in the target side of the English sentence. However, although fully resolving
zero pronouns often needs discourse context, in some cases, the local context
within a sentence gives clues to the inference of the zero pronoun. In this
study, we propose a data augmentation method that provides additional training
signals for the translation model to learn correlations between local context
and zero pronouns. We show that the proposed method significantly improves the
accuracy of zero pronoun translation with machine translation experiments in
the conversational domain.
Related papers
- Mention Attention for Pronoun Translation [5.896961355859321]
We introduce an additional mention attention module in the decoder to pay extra attention to source mentions but not non-mention tokens.
Our mention attention module not only extracts features from source mentions, but also considers target-side context which benefits pronoun translation.
We conduct experiments on the WMT17 English-German translation task, and evaluate our models on general translation and pronoun translation.
arXiv Detail & Related papers (2024-12-19T13:19:19Z) - Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - A Survey on Zero Pronoun Translation [69.09774294082965]
Zero pronouns (ZPs) are frequently omitted in pro-drop languages, but should be recalled in non-pro-drop languages.
This survey paper highlights the major works that have been undertaken in zero pronoun translation (ZPT) after the neural revolution.
We uncover a number of insightful findings such as: 1) ZPT is in line with the development trend of large language model; 2) data limitation causes learning bias in languages and domains; 3) performance improvements are often reported on single benchmarks, but advanced methods are still far from real-world use.
arXiv Detail & Related papers (2023-05-17T13:19:01Z) - "I'm" Lost in Translation: Pronoun Missteps in Crowdsourced Data Sets [13.32560004325655]
Crowdsourcing initiatives have focused on multilingual translation of big, open data sets for use in natural language processing (NLP)
We focus on the case of pronouns translated between English and Japanese in the crowdsourced Tatoeba database.
We found that masculine pronoun biases were present overall, even though plurality in language was accounted for in other ways.
arXiv Detail & Related papers (2023-04-22T09:27:32Z) - Data Augmentation Methods for Anaphoric Zero Pronouns [8.732165992971545]
We use five data augmentation methods to generate and detect anaphoric zero pronouns automatically.
We use the augmented data as additional training materials for two anaphoric zero pronoun systems for Arabic.
arXiv Detail & Related papers (2021-09-20T20:16:01Z) - Exophoric Pronoun Resolution in Dialogues with Topic Regularization [84.23706744602217]
Resolving pronouns to their referents has long been studied as a fundamental natural language understanding problem.
Previous works on pronoun coreference resolution (PCR) mostly focus on resolving pronouns to mentions in text while ignoring the exophoric scenario.
We propose to jointly leverage the local context and global topics of dialogues to solve the out-of-textPCR problem.
arXiv Detail & Related papers (2021-09-10T11:08:31Z) - Do Context-Aware Translation Models Pay the Right Attention? [61.25804242929533]
Context-aware machine translation models are designed to leverage contextual information, but often fail to do so.
In this paper, we ask several questions: What contexts do human translators use to resolve ambiguous words?
We introduce SCAT (Supporting Context for Ambiguous Translations), a new English-French dataset comprising supporting context words for 14K translations.
Using SCAT, we perform an in-depth analysis of the context used to disambiguate, examining positional and lexical characteristics of the supporting words.
arXiv Detail & Related papers (2021-05-14T17:32:24Z) - Repairing Pronouns in Translation with BERT-Based Post-Editing [7.6344611819427035]
We show that in some domains, pronoun choice can account for more than half of a NMT systems' errors.
We then investigate a possible solution: fine-tuning BERT on a pronoun prediction task using chunks of source-side sentences.
arXiv Detail & Related papers (2021-03-23T21:01:03Z) - Transformer-GCRF: Recovering Chinese Dropped Pronouns with General
Conditional Random Fields [54.03719496661691]
We present a novel framework that combines the strength of Transformer network with General Conditional Random Fields (GCRF) to model the dependencies between pronouns in neighboring utterances.
Results on three Chinese conversation datasets show that the Transformer-GCRF model outperforms the state-of-the-art dropped pronoun recovery models.
arXiv Detail & Related papers (2020-10-07T07:06:09Z) - Scalable Cross Lingual Pivots to Model Pronoun Gender for Translation [4.775445987662862]
Machine translation systems with inadequate document understanding can make errors when translating dropped or neutral pronouns into languages with gendered pronouns.
We propose a novel cross-lingual pivoting technique for automatically producing high-quality gender labels.
arXiv Detail & Related papers (2020-06-16T02:41:46Z) - Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models.
In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them.
We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.