Related papers: ngram-OAXE: Phrase-Based Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation

ngram-OAXE: Phrase-Based Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation

URL: http://arxiv.org/abs/2210.03999v1
Date: Sat, 8 Oct 2022 11:39:15 GMT
Title: ngram-OAXE: Phrase-Based Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation
Authors: Cunxiao Du and Zhaopeng Tu and Longyue Wang and Jing Jiang
Abstract summary: A new training oaxe loss has proven effective to ameliorate the effect of multimodality for non-autoregressive translation (NAT) We extend oaxe by only allowing reordering between ngram phrases and still requiring a strict match of word order within the phrases. Further analyses show that ngram-oaxe indeed improves the translation of ngram phrases, and produces more fluent translation with a better modeling of sentence structure.
Score: 51.06378042344563
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, a new training oaxe loss has proven effective to ameliorate the effect of multimodality for non-autoregressive translation (NAT), which removes the penalty of word order errors in the standard cross-entropy loss. Starting from the intuition that reordering generally occurs between phrases, we extend oaxe by only allowing reordering between ngram phrases and still requiring a strict match of word order within the phrases. Extensive experiments on NAT benchmarks across language pairs and data scales demonstrate the effectiveness and universality of our approach. %Further analyses show that the proposed ngram-oaxe alleviates the multimodality problem with a better modeling of phrase translation. Further analyses show that ngram-oaxe indeed improves the translation of ngram phrases, and produces more fluent translation with a better modeling of sentence structure.

Related papers

An Analysis of BPE Vocabulary Trimming in Neural Machine Translation [56.383793805299234]
vocabulary trimming is a postprocessing step that replaces rare subwords with their component subwords. We show that vocabulary trimming fails to improve performance and is even prone to incurring heavy degradation.
arXiv Detail & Related papers (2024-03-30T15:29:49Z)
Fuzzy Alignments in Directed Acyclic Graph for Non-Autoregressive Machine Translation [18.205288788056787]
Non-autoregressive translation (NAT) reduces the decoding latency but suffers from performance degradation due to the multi-modality problem. In this paper, we hold the view that all paths in the graph are fuzzily aligned with the reference sentence. We do not require the exact alignment but train the model to maximize a fuzzy alignment score between the graph and reference, which takes translations captured in all modalities into account.
arXiv Detail & Related papers (2023-03-12T13:51:38Z)
Categorizing Semantic Representations for Neural Machine Translation [53.88794787958174]
We introduce categorization to the source contextualized representations. The main idea is to enhance generalization by reducing sparsity and overfitting. Experiments on a dedicated MT dataset show that our method reduces compositional generalization error rates by 24% error reduction.
arXiv Detail & Related papers (2022-10-13T04:07:08Z)
DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus. We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z)
Monotonic Simultaneous Translation with Chunk-wise Reordering and Refinement [38.89496608319392]
We propose an algorithm to reorder and refine the target side of a full sentence translation corpus. The words/phrases between the source and target sentences are aligned largely monotonically, using word alignment and non-autoregressive neural machine translation. The proposed approach improves BLEU scores and resulting translations exhibit enhanced monotonicity with source sentences.
arXiv Detail & Related papers (2021-10-18T22:51:21Z)
Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation [28.800695682918757]
A new training objective named order-agnostic cross entropy (OaXE) is proposed for non-autoregressive translation (NAT) models. OaXE computes the cross entropy loss based on the best possible alignment between model predictions and target tokens. Experiments on major WMT benchmarks show that OaXE substantially improves translation performance.
arXiv Detail & Related papers (2021-06-09T14:15:12Z)
Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation [98.11249019844281]
Knowledge distillation (KD) is commonly used to construct synthetic data for training non-autoregressive translation (NAT) models. We propose reverse KD to rejuvenate more alignments for low-frequency target words. Results demonstrate that the proposed approach can significantly and universally improve translation quality.
arXiv Detail & Related papers (2021-06-02T02:41:40Z)
Modeling Coverage for Non-Autoregressive Neural Machine Translation [9.173385214565451]
We propose a novel Coverage-NAT to model the coverage information directly by a token-level coverage iterative refinement mechanism and a sentence-level coverage agreement. Experimental results on WMT14 En-De and WMT16 En-Ro translation tasks show that our method can alleviate those errors and achieve strong improvements over the baseline system.
arXiv Detail & Related papers (2021-04-24T07:33:23Z)
On Long-Tailed Phenomena in Neural Machine Translation [50.65273145888896]
State-of-the-art Neural Machine Translation (NMT) models struggle with generating low-frequency tokens. We propose a new loss function, the Anti-Focal loss, to better adapt model training to the structural dependencies of conditional text generation. We show the efficacy of the proposed technique on a number of Machine Translation (MT) datasets, demonstrating that it leads to significant gains over cross-entropy.
arXiv Detail & Related papers (2020-10-10T07:00:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.