Related papers: Multilingual Neural Machine Translation with Deep Encoder and Multiple Shallow Decoders

Multilingual Neural Machine Translation with Deep Encoder and Multiple Shallow Decoders

URL: http://arxiv.org/abs/2206.02079v1
Date: Sun, 5 Jun 2022 01:15:04 GMT
Title: Multilingual Neural Machine Translation with Deep Encoder and Multiple Shallow Decoders
Authors: Xiang Kong, Adithya Renduchintala, James Cross, Yuqing Tang, Jiatao Gu, Xian Li
Abstract summary: We propose a deep encoder with multiple shallow decoders (DEMSD) where each shallow decoder is responsible for a disjoint subset of target languages. DEMSD model with 2-layer decoders is able to obtain a 1.8x speedup on average compared to a standard transformer model with no drop in translation quality.
Score: 77.2101943305862
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent work in multilingual translation advances translation quality surpassing bilingual baselines using deep transformer models with increased capacity. However, the extra latency and memory costs introduced by this approach may make it unacceptable for efficiency-constrained applications. It has recently been shown for bilingual translation that using a deep encoder and shallow decoder (DESD) can reduce inference latency while maintaining translation quality, so we study similar speed-accuracy trade-offs for multilingual translation. We find that for many-to-one translation we can indeed increase decoder speed without sacrificing quality using this approach, but for one-to-many translation, shallow decoders cause a clear quality drop. To ameliorate this drop, we propose a deep encoder with multiple shallow decoders (DEMSD) where each shallow decoder is responsible for a disjoint subset of target languages. Specifically, the DEMSD model with 2-layer decoders is able to obtain a 1.8x speedup on average compared to a standard transformer model with no drop in translation quality.

Related papers

Improving Language Transfer Capability of Decoder-only Architecture in Multilingual Neural Machine Translation [28.07831604833682]
We investigate the issue of the decoder-only architecture to its lack of language transfer capability. We propose dividing the decoding process into two stages so that target tokens are explicitly excluded in the first stage. We impose contrastive learning on translation instructions, resulting in improved performance in zero-shot translation.
arXiv Detail & Related papers (2024-12-03T02:52:14Z)
Learning Language-Specific Layers for Multilingual Machine Translation [1.997704019887898]
We introduce Language-Specific Transformer Layers (LSLs) LSLs allow us to increase model capacity, while keeping the amount of computation and the number of parameters used in the forward pass constant. We study the best way to place these layers using a neural architecture search inspired approach, and achieve an improvement of 1.3 chrF (1.5 spBLEU) points over not using LSLs on a separate decoder architecture, and 1.9 chrF (2.2 spBLEU) on a shared decoder one.
arXiv Detail & Related papers (2023-05-04T09:18:05Z)
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates [59.678108707409606]
We propose Fast-MD, a fast MD model that generates HI by non-autoregressive decoding based on connectionist temporal classification (CTC) outputs followed by an ASR decoder. Fast-MD achieved about 2x and 4x faster decoding speed than that of the na"ive MD model on GPU and CPU with comparable translation quality.
arXiv Detail & Related papers (2021-09-27T05:21:30Z)
Efficient Inference for Multilingual Neural Machine Translation [60.10996883354372]
We consider several ways to make multilingual NMT faster at inference without degrading its quality. Our experiments demonstrate that combining a shallow decoder with vocabulary filtering leads to more than twice faster inference with no loss in translation quality.
arXiv Detail & Related papers (2021-09-14T13:28:13Z)
DeltaLM: Encoder-Decoder Pre-training for Language Generation and Translation by Augmenting Pretrained Multilingual Encoders [92.90543340071007]
We introduce DeltaLM, a pretrained multilingual encoder-decoder model. Specifically, we augment the pretrained multilingual encoder with a decoder and pre-train it in a self-supervised way. Experiments show that DeltaLM outperforms various strong baselines on both natural language generation and translation tasks.
arXiv Detail & Related papers (2021-06-25T16:12:10Z)
Probing Word Translations in the Transformer and Trading Decoder for Encoder Layers [69.40942736249397]
The way word translation evolves in Transformer layers has not yet been investigated. We show that translation already happens progressively in encoder layers and even in the input embeddings. Our experiments show that we can increase speed by up to a factor 2.3 with small gains in translation quality, while an 18-4 deep encoder configuration boosts translation quality by +1.42 BLEU (En-De) at a speed-up of 1.4.
arXiv Detail & Related papers (2020-03-21T06:12:14Z)
Bi-Decoder Augmented Network for Neural Machine Translation [108.3931242633331]
We propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task. Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space.
arXiv Detail & Related papers (2020-01-14T02:05:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.