Efficient Inference For Neural Machine Translation
- URL: http://arxiv.org/abs/2010.02416v2
- Date: Wed, 7 Oct 2020 13:48:02 GMT
- Title: Efficient Inference For Neural Machine Translation
- Authors: Yi-Te Hsu, Sarthak Garg, Yi-Hsiu Liao, Ilya Chatsviorkin
- Abstract summary: Large Transformer models have achieved state-of-the-art results in neural machine translation.
We look for the optimal combination of known techniques to optimize inference speed without sacrificing translation quality.
- Score: 3.0338337603465013
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Transformer models have achieved state-of-the-art results in neural
machine translation and have become standard in the field. In this work, we
look for the optimal combination of known techniques to optimize inference
speed without sacrificing translation quality. We conduct an empirical study
that stacks various approaches and demonstrates that combination of replacing
decoder self-attention with simplified recurrent units, adopting a deep encoder
and a shallow decoder architecture and multi-head attention pruning can achieve
up to 109% and 84% speedup on CPU and GPU respectively and reduce the number of
parameters by 25% while maintaining the same translation quality in terms of
BLEU.
Related papers
- Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning [73.73967342609603]
We introduce a predictor-corrector learning framework to minimize truncation errors.
We also propose an exponential moving average-based coefficient learning method to strengthen our higher-order predictor.
Our model surpasses a robust 3.8B DeepNet by an average of 2.9 SacreBLEU, using only 1/3 parameters.
arXiv Detail & Related papers (2024-11-05T12:26:25Z) - Accelerating Error Correction Code Transformers [56.75773430667148]
We introduce a novel acceleration method for transformer-based decoders.
We achieve a 90% compression ratio and reduce arithmetic operation energy consumption by at least 224 times on modern hardware.
arXiv Detail & Related papers (2024-10-08T11:07:55Z) - Extreme Encoder Output Frame Rate Reduction: Improving Computational
Latencies of Large End-to-End Models [59.57732929473519]
We apply multiple frame reduction layers in the encoder to compress encoder outputs into a small number of output frames.
We demonstrate that we can generate one encoder output frame for every 2.56 sec of input speech, without significantly affecting word error rate on a large-scale voice search task.
arXiv Detail & Related papers (2024-02-27T03:40:44Z) - Heterogeneous Encoders Scaling In The Transformer For Neural Machine
Translation [47.82947878753809]
We investigate the effectiveness of integrating an increasing number of heterogeneous methods.
Based on a simple combination strategy and performance-driven synergy criteria, we designed the Multi-Encoder Transformer.
Results showcased that our approach can improve the quality of the translation across a variety of languages and dataset sizes.
arXiv Detail & Related papers (2023-12-26T03:39:08Z) - ConvNeXt-ChARM: ConvNeXt-based Transform for Efficient Neural Image
Compression [18.05997169440533]
We propose ConvNeXt-ChARM, an efficient ConvNeXt-based transform coding framework, paired with a compute-efficient channel-wise auto-regressive auto-regressive.
We show that ConvNeXt-ChARM brings consistent and significant BD-rate (PSNR) reductions estimated on average to 5.24% and 1.22% over the versatile video coding (VVC) reference encoder (VTM-18.0) and the state-of-the-art learned image compression method SwinT-ChARM.
arXiv Detail & Related papers (2023-07-12T11:45:54Z) - Joint Hierarchical Priors and Adaptive Spatial Resolution for Efficient
Neural Image Compression [11.25130799452367]
We propose an absolute image compression transformer (ICT) for neural image compression (NIC)
ICT captures both global and local contexts from the latent representations and better parameterize the distribution of the quantized latents.
Our framework significantly improves the trade-off between coding efficiency and decoder complexity over the versatile video coding (VVC) reference encoder (VTM-18.0) and the neural SwinT-ChARM.
arXiv Detail & Related papers (2023-07-05T13:17:14Z) - Multilingual Neural Machine Translation with Deep Encoder and Multiple
Shallow Decoders [77.2101943305862]
We propose a deep encoder with multiple shallow decoders (DEMSD) where each shallow decoder is responsible for a disjoint subset of target languages.
DEMSD model with 2-layer decoders is able to obtain a 1.8x speedup on average compared to a standard transformer model with no drop in translation quality.
arXiv Detail & Related papers (2022-06-05T01:15:04Z) - Multiscale Collaborative Deep Models for Neural Machine Translation [40.52423993051359]
We present a MultiScale Collaborative (MSC) framework to ease the training of NMT models that are substantially deeper than those used previously.
We explicitly boost the gradient back-propagation from top to bottom levels by introducing a block-scale collaboration mechanism into deep NMT models.
Our deep MSC achieves a BLEU score of 30.56 on WMT14 English-German task that significantly outperforms state-of-the-art deep NMT models.
arXiv Detail & Related papers (2020-04-29T08:36:08Z) - Bi-Decoder Augmented Network for Neural Machine Translation [108.3931242633331]
We propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task.
Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space.
arXiv Detail & Related papers (2020-01-14T02:05:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.