Neural Machine Translation with Joint Representation
- URL: http://arxiv.org/abs/2002.06546v2
- Date: Tue, 18 Feb 2020 01:59:03 GMT
- Title: Neural Machine Translation with Joint Representation
- Authors: Yanyang Li, Qiang Wang, Tong Xiao, Tongran Liu, Jingbo Zhu
- Abstract summary: Recent Neural Machine Translation (NMT) systems resort to the attention which partially encodes the interaction for efficiency.
In this paper, we employ Joint Representation that fully accounts for each possible interaction.
We sidestep the inefficiency issue by refining representations with the proposed efficient attention operation.
- Score: 42.491774594572725
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Though early successes of Statistical Machine Translation (SMT) systems are
attributed in part to the explicit modelling of the interaction between any two
source and target units, e.g., alignment, the recent Neural Machine Translation
(NMT) systems resort to the attention which partially encodes the interaction
for efficiency. In this paper, we employ Joint Representation that fully
accounts for each possible interaction. We sidestep the inefficiency issue by
refining representations with the proposed efficient attention operation. The
resulting Reformer models offer a new Sequence-to- Sequence modelling paradigm
besides the Encoder-Decoder framework and outperform the Transformer baseline
in either the small scale IWSLT14 German-English, English-German and IWSLT15
Vietnamese-English or the large scale NIST12 Chinese-English translation tasks
by about 1 BLEU point.We also propose a systematic model scaling approach,
allowing the Reformer model to beat the state-of-the-art Transformer in IWSLT14
German-English and NIST12 Chinese-English with about 50% fewer parameters. The
code is publicly available at https://github.com/lyy1994/reformer.
Related papers
- Efficient Machine Translation with a BiLSTM-Attention Approach [0.0]
This paper proposes a novel Seq2Seq model aimed at improving translation quality while reducing the storage space required by the model.
The model employs a Bidirectional Long Short-Term Memory network (Bi-LSTM) as the encoder to capture the context information of the input sequence.
Compared to the current mainstream Transformer model, our model achieves superior performance on the WMT14 machine translation dataset.
arXiv Detail & Related papers (2024-10-29T01:12:50Z) - Building Multilingual Machine Translation Systems That Serve Arbitrary
X-Y Translations [75.73028056136778]
We show how to practically build MNMT systems that serve arbitrary X-Y translation directions.
We also examine our proposed approach in an extremely large-scale data setting to accommodate practical deployment scenarios.
arXiv Detail & Related papers (2022-06-30T02:18:15Z) - Improving Neural Machine Translation by Bidirectional Training [85.64797317290349]
We present a simple and effective pretraining strategy -- bidirectional training (BiT) for neural machine translation.
Specifically, we bidirectionally update the model parameters at the early stage and then tune the model normally.
Experimental results show that BiT pushes the SOTA neural machine translation performance across 15 translation tasks on 8 language pairs significantly higher.
arXiv Detail & Related papers (2021-09-16T07:58:33Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - Learning Source Phrase Representations for Neural Machine Translation [65.94387047871648]
We propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations.
In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline.
arXiv Detail & Related papers (2020-06-25T13:43:11Z) - Universal Vector Neural Machine Translation With Effective Attention [0.0]
We propose a singular model for Neural Machine Translation based on encoder-decoder models.
We introduce a neutral/universal model representation that can be used to predict more than one language.
arXiv Detail & Related papers (2020-06-09T01:13:57Z) - Early Stage LM Integration Using Local and Global Log-Linear Combination [46.91755970827846]
Sequence-to-sequence models with an implicit alignment mechanism (e.g. attention) are closing the performance gap towards traditional hybrid hidden Markov models (HMM)
One important factor to improve word error rate in both cases is the use of an external language model (LM) trained on large text-only corpora.
We present a novel method for language model integration into implicit-alignment based sequence-to-sequence models.
arXiv Detail & Related papers (2020-05-20T13:49:55Z) - Attention Is All You Need [36.87735219227719]
We propose a new simple network architecture, the Transformer, based solely on attention mechanisms.
Experiments on two machine translation tasks show these models to be superior in quality.
arXiv Detail & Related papers (2017-06-12T17:57:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.