Multilingual Neural Machine Translation with Deep Encoder and Multiple
Shallow Decoders
- URL: http://arxiv.org/abs/2206.02079v1
- Date: Sun, 5 Jun 2022 01:15:04 GMT
- Title: Multilingual Neural Machine Translation with Deep Encoder and Multiple
Shallow Decoders
- Authors: Xiang Kong, Adithya Renduchintala, James Cross, Yuqing Tang, Jiatao
Gu, Xian Li
- Abstract summary: We propose a deep encoder with multiple shallow decoders (DEMSD) where each shallow decoder is responsible for a disjoint subset of target languages.
DEMSD model with 2-layer decoders is able to obtain a 1.8x speedup on average compared to a standard transformer model with no drop in translation quality.
- Score: 77.2101943305862
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent work in multilingual translation advances translation quality
surpassing bilingual baselines using deep transformer models with increased
capacity. However, the extra latency and memory costs introduced by this
approach may make it unacceptable for efficiency-constrained applications. It
has recently been shown for bilingual translation that using a deep encoder and
shallow decoder (DESD) can reduce inference latency while maintaining
translation quality, so we study similar speed-accuracy trade-offs for
multilingual translation. We find that for many-to-one translation we can
indeed increase decoder speed without sacrificing quality using this approach,
but for one-to-many translation, shallow decoders cause a clear quality drop.
To ameliorate this drop, we propose a deep encoder with multiple shallow
decoders (DEMSD) where each shallow decoder is responsible for a disjoint
subset of target languages. Specifically, the DEMSD model with 2-layer decoders
is able to obtain a 1.8x speedup on average compared to a standard transformer
model with no drop in translation quality.
Related papers
- Learning Language-Specific Layers for Multilingual Machine Translation [1.997704019887898]
We introduce Language-Specific Transformer Layers (LSLs)
LSLs allow us to increase model capacity, while keeping the amount of computation and the number of parameters used in the forward pass constant.
We study the best way to place these layers using a neural architecture search inspired approach, and achieve an improvement of 1.3 chrF (1.5 spBLEU) points over not using LSLs on a separate decoder architecture, and 1.9 chrF (2.2 spBLEU) on a shared decoder one.
arXiv Detail & Related papers (2023-05-04T09:18:05Z) - Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with
Non-Autoregressive Hidden Intermediates [59.678108707409606]
We propose Fast-MD, a fast MD model that generates HI by non-autoregressive decoding based on connectionist temporal classification (CTC) outputs followed by an ASR decoder.
Fast-MD achieved about 2x and 4x faster decoding speed than that of the na"ive MD model on GPU and CPU with comparable translation quality.
arXiv Detail & Related papers (2021-09-27T05:21:30Z) - Efficient Inference for Multilingual Neural Machine Translation [60.10996883354372]
We consider several ways to make multilingual NMT faster at inference without degrading its quality.
Our experiments demonstrate that combining a shallow decoder with vocabulary filtering leads to more than twice faster inference with no loss in translation quality.
arXiv Detail & Related papers (2021-09-14T13:28:13Z) - DeltaLM: Encoder-Decoder Pre-training for Language Generation and
Translation by Augmenting Pretrained Multilingual Encoders [92.90543340071007]
We introduce DeltaLM, a pretrained multilingual encoder-decoder model.
Specifically, we augment the pretrained multilingual encoder with a decoder and pre-train it in a self-supervised way.
Experiments show that DeltaLM outperforms various strong baselines on both natural language generation and translation tasks.
arXiv Detail & Related papers (2021-06-25T16:12:10Z) - Probing Word Translations in the Transformer and Trading Decoder for
Encoder Layers [69.40942736249397]
The way word translation evolves in Transformer layers has not yet been investigated.
We show that translation already happens progressively in encoder layers and even in the input embeddings.
Our experiments show that we can increase speed by up to a factor 2.3 with small gains in translation quality, while an 18-4 deep encoder configuration boosts translation quality by +1.42 BLEU (En-De) at a speed-up of 1.4.
arXiv Detail & Related papers (2020-03-21T06:12:14Z) - Bi-Decoder Augmented Network for Neural Machine Translation [108.3931242633331]
We propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task.
Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space.
arXiv Detail & Related papers (2020-01-14T02:05:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.