The NiuTrans System for WNGT 2020 Efficiency Task
- URL: http://arxiv.org/abs/2109.08008v1
- Date: Thu, 16 Sep 2021 14:32:01 GMT
- Title: The NiuTrans System for WNGT 2020 Efficiency Task
- Authors: Chi Hu, Bei Li, Ye Lin, Yinqiao Li, Yanyang Li, Chenglong Wang, Tong
Xiao, Jingbo Zhu
- Abstract summary: This paper describes the submissions of the NiuTrans Team to the WNGT 2020 Efficiency Shared Task.
We focus on the efficient implementation of deep Transformer models using NiuTensor, a flexible toolkit for NLP tasks.
- Score: 32.88733142090084
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes the submissions of the NiuTrans Team to the WNGT 2020
Efficiency Shared Task. We focus on the efficient implementation of deep
Transformer models \cite{wang-etal-2019-learning, li-etal-2019-niutrans} using
NiuTensor (https://github.com/NiuTrans/NiuTensor), a flexible toolkit for NLP
tasks. We explored the combination of deep encoder and shallow decoder in
Transformer models via model compression and knowledge distillation. The neural
machine translation decoding also benefits from FP16 inference, attention
caching, dynamic batching, and batch pruning. Our systems achieve promising
results in both translation quality and efficiency, e.g., our fastest system
can translate more than 40,000 tokens per second with an RTX 2080 Ti while
maintaining 42.9 BLEU on \textit{newstest2018}. The code, models, and docker
images are available at NiuTrans.NMT
(https://github.com/NiuTrans/NiuTrans.NMT).
Related papers
- Shallow Cross-Encoders for Low-Latency Retrieval [69.06104373460597]
Cross-Encoders based on large transformer models (such as BERT or T5) are computationally expensive and allow for scoring only a small number of documents within a reasonably small latency window.
We show that weaker shallow transformer models (i.e., transformers with a limited number of layers) actually perform better than full-scale models when constrained to these practical low-latency settings.
arXiv Detail & Related papers (2024-03-29T15:07:21Z) - GTrans: Grouping and Fusing Transformer Layers for Neural Machine
Translation [107.2752114891855]
Transformer structure, stacked by a sequence of encoder and decoder network layers, achieves significant development in neural machine translation.
We propose the Group-Transformer model (GTrans) that flexibly divides multi-layer representations of both encoder and decoder into different groups and then fuses these group features to generate target words.
arXiv Detail & Related papers (2022-07-29T04:10:36Z) - The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline
Shared Task [92.5087402621697]
This paper describes the submission of our end-to-end YiTrans speech translation system for the IWSLT 2022 offline task.
The YiTrans system is built on large-scale pre-trained encoder-decoder models.
Our final submissions rank first on English-German and English-Chinese end-to-end systems in terms of the automatic evaluation metric.
arXiv Detail & Related papers (2022-06-12T16:13:01Z) - The NiuTrans Machine Translation Systems for WMT21 [23.121382706331403]
This paper describes NiuTrans neural machine translation systems of the WMT 2021 news translation tasks.
We made submissions to 9 language directions, including English$leftarrow$$$Chinese, Japanese, Russian, Icelandic$$ and English$rightarrow$Hausa tasks.
arXiv Detail & Related papers (2021-09-22T02:00:24Z) - The NiuTrans System for the WMT21 Efficiency Task [26.065244284992147]
This paper describes the NiuTrans system for the WMT21 translation efficiency task.
Our system can translate 247,000 words per second on an NVIDIA A100, being 3$times$ faster than last year's system.
arXiv Detail & Related papers (2021-09-16T14:21:52Z) - The NiuTrans End-to-End Speech Translation System for IWSLT 2021 Offline
Task [23.008938777422767]
This paper describes the submission of the NiuTrans end-to-end speech translation system for the IWSLT 2021 offline task.
We use the Transformer-based model architecture and enhance it by Conformer, relative position encoding, and stacked acoustic and textual encoding.
We achieve 33.84 BLEU points on the MuST-C En-De test set, which shows the enormous potential of the end-to-end model.
arXiv Detail & Related papers (2021-07-06T07:45:23Z) - TransFuse: Fusing Transformers and CNNs for Medical Image Segmentation [9.266588373318688]
We study the problem of improving efficiency in modeling global contexts without losing localization ability for low-level details.
TransFuse, a novel two-branch architecture is proposed, which combines Transformers and CNNs in a parallel style.
With TransFuse, both global dependency and low-level spatial details can be efficiently captured in a much shallower manner.
arXiv Detail & Related papers (2021-02-16T08:09:45Z) - Glancing Transformer for Non-Autoregressive Neural Machine Translation [58.87258329683682]
We propose a method to learn word interdependency for single-pass parallel generation models.
With only single-pass parallel decoding, GLAT is able to generate high-quality translation with 8-15 times speedup.
arXiv Detail & Related papers (2020-08-18T13:04:03Z) - Very Deep Transformers for Neural Machine Translation [100.51465892354234]
We show that it is feasible to build standard Transformer-based models with up to 60 encoder layers and 12 decoder layers.
These deep models outperform their baseline 6-layer counterparts by as much as 2.5 BLEU.
arXiv Detail & Related papers (2020-08-18T07:14:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.