The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21
- URL: http://arxiv.org/abs/2109.11247v2
- Date: Fri, 24 Sep 2021 03:24:24 GMT
- Title: The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21
- Authors: Lihua Qian, Yi Zhou, Zaixiang Zheng, Yaoming Zhu, Zehui Lin, Jiangtao
Feng, Shanbo Cheng, Lei Li, Mingxuan Wang and Hao Zhou
- Abstract summary: We build a parallel (i.e., non-autoregressive) translation system using the Glancing Transformer.
Our system achieves the best BLEU score (35.0) on German->English translation task, outperforming all strong autoregressive counterparts.
- Score: 25.41660831320743
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper describes the Volctrans' submission to the WMT21 news translation
shared task for German->English translation. We build a parallel (i.e.,
non-autoregressive) translation system using the Glancing Transformer, which
enables fast and accurate parallel decoding in contrast to the currently
prevailing autoregressive models. To the best of our knowledge, this is the
first parallel translation system that can be scaled to such a practical
scenario like WMT competition. More importantly, our parallel translation
system achieves the best BLEU score (35.0) on German->English translation task,
outperforming all strong autoregressive counterparts.
Related papers
- HW-TSC's Submission to the CCMT 2024 Machine Translation Tasks [12.841065384808733]
We participate in the bilingual machine translation task and multi-domain machine translation task.
For these two translation tasks, we use training strategies such as regularized dropout, bidirectional training, data diversification, forward translation, back translation, alternated training, curriculum learning, and transductive ensemble learning.
arXiv Detail & Related papers (2024-09-23T09:20:19Z) - TranSFormer: Slow-Fast Transformer for Machine Translation [52.12212173775029]
We present a textbfSlow-textbfFast two-stream learning model, referred to as TrantextbfSFormer.
Our TranSFormer shows consistent BLEU improvements (larger than 1 BLEU point) on several machine translation benchmarks.
arXiv Detail & Related papers (2023-05-26T14:37:38Z) - The RoyalFlush System for the WMT 2022 Efficiency Task [11.00644143928471]
This paper describes the submission of the Royal neural machine translation system for the WMT 2022 translation efficiency task.
Unlike the commonly used autoregressive translation system, we adopted a two-stage translation paradigm called Hybrid Regression Translation.
Our fastest system reaches 6k+ words/second on the GPU latency setting, estimated to be about 3.1x faster than the last year's winner.
arXiv Detail & Related papers (2022-12-03T05:36:10Z) - Modeling Context With Linear Attention for Scalable Document-Level
Translation [72.41955536834702]
We investigate the efficacy of a recent linear attention model on document translation and augment it with a sentential gate to promote a recency inductive bias.
We show that sentential gating further improves translation quality on IWSLT.
arXiv Detail & Related papers (2022-10-16T03:41:50Z) - The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline
Shared Task [92.5087402621697]
This paper describes the submission of our end-to-end YiTrans speech translation system for the IWSLT 2022 offline task.
The YiTrans system is built on large-scale pre-trained encoder-decoder models.
Our final submissions rank first on English-German and English-Chinese end-to-end systems in terms of the automatic evaluation metric.
arXiv Detail & Related papers (2022-06-12T16:13:01Z) - Multilingual Machine Translation Systems from Microsoft for WMT21 Shared
Task [95.06453182273027]
This report describes Microsoft's machine translation systems for the WMT21 shared task on large-scale multilingual machine translation.
Our model submissions to the shared task were with DeltaLMnotefooturlhttps://aka.ms/deltalm, a generic pre-trained multilingual-decoder model.
Our final submissions ranked first on three tracks in terms of the automatic evaluation metric.
arXiv Detail & Related papers (2021-11-03T09:16:17Z) - Non-Autoregressive Translation with Layer-Wise Prediction and Deep
Supervision [33.04082398101807]
Existing neural machine translation models, such as Transformer, achieve high performance, but they decode words one by one, which is inefficient.
Recent non-autoregressive translation models speed up the inference, but their quality is still inferior.
We propose DSLP, a highly efficient and high-performance model for machine translation.
arXiv Detail & Related papers (2021-10-14T16:36:12Z) - The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline
Task [23.008938777422767]
This paper describes the submission of the NiuTrans end-to-end speech translation system for the IWSLT 2021 offline task.
We use the Transformer-based model architecture and enhance it by Conformer, relative position encoding, and stacked acoustic and textual encoding.
We achieve 33.84 BLEU points on the MuST-C En-De test set, which shows the enormous potential of the end-to-end model.
arXiv Detail & Related papers (2021-07-06T07:45:23Z) - The Volctrans Neural Speech Translation System for IWSLT 2021 [26.058205594318405]
This paper describes the systems submitted to IWSLT 2021 by the Volctrans team.
For offline speech translation, our best end-to-end model achieves 8.1 BLEU improvements over the benchmark.
For text-to-text simultaneous translation, we explore the best practice to optimize the wait-k model.
arXiv Detail & Related papers (2021-05-16T00:11:59Z) - The LMU Munich System for the WMT 2020 Unsupervised Machine Translation
Shared Task [125.06737861979299]
This paper describes the submission of LMU Munich to the WMT 2020 unsupervised shared task, in two language directions.
Our core unsupervised neural machine translation (UNMT) system follows the strategy of Chronopoulou et al.
We ensemble our best-performing systems and reach a BLEU score of 32.4 on German->Upper Sorbian and 35.2 on Upper Sorbian->German.
arXiv Detail & Related papers (2020-10-25T19:04:03Z) - Glancing Transformer for Non-Autoregressive Neural Machine Translation [58.87258329683682]
We propose a method to learn word interdependency for single-pass parallel generation models.
With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8-15 times speedup.
arXiv Detail & Related papers (2020-08-18T13:04:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.