Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine
Translation
- URL: http://arxiv.org/abs/2210.03953v1
- Date: Sat, 8 Oct 2022 07:44:28 GMT
- Title: Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine
Translation
- Authors: Chenze Shao and Yang Feng
- Abstract summary: Non-autoregressive translation (NAT) models are typically trained with the cross-entropy loss.
Latent alignment models relax the explicit alignment by marginalizing out all monotonic latent alignments with the CTC loss.
We extend the alignment space to non-monotonic alignments to allow for the global word reordering.
- Score: 15.309573393914462
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-autoregressive translation (NAT) models are typically trained with the
cross-entropy loss, which forces the model outputs to be aligned verbatim with
the target sentence and will highly penalize small shifts in word positions.
Latent alignment models relax the explicit alignment by marginalizing out all
monotonic latent alignments with the CTC loss. However, they cannot handle
non-monotonic alignments, which is non-negligible as there is typically global
word reordering in machine translation. In this work, we explore non-monotonic
latent alignments for NAT. We extend the alignment space to non-monotonic
alignments to allow for the global word reordering and further consider all
alignments that overlap with the target sentence. We non-monotonically match
the alignments to the target sentence and train the latent alignment model to
maximize the F1 score of non-monotonic matching. Extensive experiments on major
WMT benchmarks show that our method substantially improves the translation
performance of CTC-based models. Our best model achieves 30.06 BLEU on WMT14
En-De with only one-iteration decoding, closing the gap between
non-autoregressive and autoregressive models.
Related papers
- Unbalanced Optimal Transport for Unbalanced Word Alignment [17.08341136230076]
This study shows that the family of optimal transport (OT), i.e. balanced, partial, and unbalanced OT, are natural and powerful approaches even without tailor-made techniques.
Our experiments covering unsupervised and supervised settings indicate that our generic OT-based alignment methods are competitive against the state-of-the-arts specially designed for word alignment.
arXiv Detail & Related papers (2023-06-07T03:03:41Z) - CTC-based Non-autoregressive Speech Translation [51.37920141751813]
We investigate the potential of connectionist temporal classification for non-autoregressive speech translation.
We develop a model consisting of two encoders that are guided by CTC to predict the source and target texts.
Experiments on the MuST-C benchmarks show that our NAST model achieves an average BLEU score of 29.5 with a speed-up of 5.67$times$.
arXiv Detail & Related papers (2023-05-27T03:54:09Z) - Optimizing Non-Autoregressive Transformers with Contrastive Learning [74.46714706658517]
Non-autoregressive Transformers (NATs) reduce the inference latency of Autoregressive Transformers (ATs) by predicting words all at once rather than in sequential order.
In this paper, we propose to ease the difficulty of modality learning via sampling from the model distribution instead of the data distribution.
arXiv Detail & Related papers (2023-05-23T04:20:13Z) - Fuzzy Alignments in Directed Acyclic Graph for Non-Autoregressive
Machine Translation [18.205288788056787]
Non-autoregressive translation (NAT) reduces the decoding latency but suffers from performance degradation due to the multi-modality problem.
In this paper, we hold the view that all paths in the graph are fuzzily aligned with the reference sentence.
We do not require the exact alignment but train the model to maximize a fuzzy alignment score between the graph and reference, which takes translations captured in all modalities into account.
arXiv Detail & Related papers (2023-03-12T13:51:38Z) - Regotron: Regularizing the Tacotron2 architecture via monotonic
alignment loss [71.30589161727967]
We introduce Regotron, a regularized version of Tacotron2, which aims to alleviate the training issues and at the same time produce monotonic alignments.
Our method augments the vanilla Tacotron2 objective function with an additional term, which penalizes non-monotonic alignments in the location-sensitive attention mechanism.
arXiv Detail & Related papers (2022-04-28T12:08:53Z) - Anticipation-free Training for Simultaneous Translation [70.85761141178597]
Simultaneous translation (SimulMT) speeds up the translation process by starting to translate before the source sentence is completely available.
Existing methods increase latency or introduce adaptive read-write policies for SimulMT models to handle local reordering and improve translation quality.
We propose a new framework that decomposes the translation process into the monotonic translation step and the reordering step.
arXiv Detail & Related papers (2022-01-30T16:29:37Z) - AligNART: Non-autoregressive Neural Machine Translation by Jointly
Learning to Estimate Alignment and Translate [20.980671405042756]
AligNART uses alignment information to reduce the modality of the target distribution.
AligNART effectively addresses the token repetition problem even without sequence-level knowledge distillation.
arXiv Detail & Related papers (2021-09-14T07:26:33Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - SLUA: A Super Lightweight Unsupervised Word Alignment Model via
Cross-Lingual Contrastive Learning [79.91678610678885]
We propose a super lightweight unsupervised word alignment model (SLUA)
Experimental results on several public benchmarks demonstrate that our model achieves competitive, if not better, performance.
Notably, we recognize our model as a pioneer attempt to unify bilingual word embedding and word alignments.
arXiv Detail & Related papers (2021-02-08T05:54:11Z) - Align-Refine: Non-Autoregressive Speech Recognition via Iterative
Realignment [18.487842656780728]
Infilling and iterative refinement models make up some of this gap by editing the outputs of a non-autoregressive model.
We propose iterative realignment, where refinements occur over latent alignments rather than output sequence space.
arXiv Detail & Related papers (2020-10-24T09:35:37Z) - Rationalizing Text Matching: Learning Sparse Alignments via Optimal
Transport [14.86310501896212]
In this work, we extend this selective rationalization approach to text matching.
The goal is to jointly select and align text pieces, such as tokens or sentences, as a justification for the downstream prediction.
Our approach employs optimal transport (OT) to find a minimal cost alignment between the inputs.
arXiv Detail & Related papers (2020-05-27T01:20:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.