Dynamic Multi-Branch Layers for On-Device Neural Machine Translation
- URL: http://arxiv.org/abs/2105.06679v1
- Date: Fri, 14 May 2021 07:32:53 GMT
- Title: Dynamic Multi-Branch Layers for On-Device Neural Machine Translation
- Authors: Zhixing Tan, Maosong Sun, Yang Liu
- Abstract summary: We propose to improve the performance of on-device neural machine translation (NMT) systems with dynamic multi-branch layers.
Specifically, we design a layer-wise dynamic multi-branch network with only one branch activated during training and inference.
At almost the same computational cost, our method achieves improvements of up to 1.7 BLEU points on the WMT14 English-German translation task and 1.8 BLEU points on the WMT20 Chinese-English translation task.
- Score: 53.637479651600586
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rapid development of artificial intelligence (AI), there is a trend
in moving AI applications such as neural machine translation (NMT) from cloud
to mobile devices such as smartphones. Constrained by limited hardware
resources and battery, the performance of on-device NMT systems is far from
satisfactory. Inspired by conditional computation, we propose to improve the
performance of on-device NMT systems with dynamic multi-branch layers.
Specifically, we design a layer-wise dynamic multi-branch network with only one
branch activated during training and inference. As not all branches are
activated during training, we propose shared-private reparameterization to
ensure sufficient training for each branch. At almost the same computational
cost, our method achieves improvements of up to 1.7 BLEU points on the WMT14
English-German translation task and 1.8 BLEU points on the WMT20
Chinese-English translation task over the Transformer model, respectively.
Compared with a strong baseline that also uses multiple branches, the proposed
method is up to 1.6 times faster with the same number of parameters.
Related papers
- EMMeTT: Efficient Multimodal Machine Translation Training [26.295981183965566]
We propose a joint multimodal training regime of Speech-LLM to include automatic speech translation (AST)
To handle joint multimodal training, we propose a novel training framework called EMMeTT.
The resultant Multimodal Translation Model produces strong text and speech translation results at the same time.
arXiv Detail & Related papers (2024-09-20T14:03:23Z) - Harnessing Manycore Processors with Distributed Memory for Accelerated
Training of Sparse and Recurrent Models [43.1773057439246]
Current AI training infrastructure is dominated by single instruction multiple data (SIMD) and systolic array architectures.
We explore sparse and recurrent model training on a massively parallel multiple instruction multiple data architecture with distributed local memory.
arXiv Detail & Related papers (2023-11-07T23:18:35Z) - Direct Neural Machine Translation with Task-level Mixture of Experts models [1.2338729811609357]
Direct neural machine translation (direct NMT) translates text between two non-English languages.
Task-level Mixture of expert models (Task-level MoE) has shown promising NMT performance for a large number of language pairs.
arXiv Detail & Related papers (2023-10-18T18:19:45Z) - SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural
Machine Translation [51.881877192924414]
Sub-word segmentation is an essential pre-processing step for Neural Machine Translation (NMT)
This paper introduces SelfSeg, a self-supervised neural sub-word segmentation method.
SelfSeg is much faster to train/decode and requires only monolingual dictionaries instead of parallel corpora.
arXiv Detail & Related papers (2023-07-31T04:38:47Z) - TranSFormer: Slow-Fast Transformer for Machine Translation [52.12212173775029]
We present a textbfSlow-textbfFast two-stream learning model, referred to as TrantextbfSFormer.
Our TranSFormer shows consistent BLEU improvements (larger than 1 BLEU point) on several machine translation benchmarks.
arXiv Detail & Related papers (2023-05-26T14:37:38Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - TMS: A Temporal Multi-scale Backbone Design for Speaker Embedding [60.292702363839716]
Current SOTA backbone networks for speaker embedding are designed to aggregate multi-scale features from an utterance with multi-branch network architectures for speaker representation.
We propose an effective temporal multi-scale (TMS) model where multi-scale branches could be efficiently designed in a speaker embedding network almost without increasing computational costs.
arXiv Detail & Related papers (2022-03-17T05:49:35Z) - Multi-branch Attentive Transformer [152.07840447196384]
We propose a simple yet effective variant of Transformer called multi-branch attentive Transformer.
The attention layer is the average of multiple branches and each branch is an independent multi-head attention layer.
Experiments on machine translation, code generation and natural language understanding demonstrate that such a simple variant of Transformer brings significant improvements.
arXiv Detail & Related papers (2020-06-18T04:24:28Z) - Bit Allocation for Multi-Task Collaborative Intelligence [39.11380888887304]
Collaborative intelligence (CI) is a promising framework for deployment of Artificial Intelligence (AI)-based services on mobile devices.
We propose the first bit allocation method for multi-stream, multi-task CI.
arXiv Detail & Related papers (2020-02-14T02:02:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.