Learning Source Phrase Representations for Neural Machine Translation
- URL: http://arxiv.org/abs/2006.14405v1
- Date: Thu, 25 Jun 2020 13:43:11 GMT
- Title: Learning Source Phrase Representations for Neural Machine Translation
- Authors: Hongfei Xu and Josef van Genabith and Deyi Xiong and Qiuhui Liu and
Jingyi Zhang
- Abstract summary: We propose an attentive phrase representation generation mechanism which is able to generate phrase representations from corresponding token representations.
In our experiments, we obtain significant improvements on the WMT 14 English-German and English-French tasks on top of the strong Transformer baseline.
- Score: 65.94387047871648
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Transformer translation model (Vaswani et al., 2017) based on a
multi-head attention mechanism can be computed effectively in parallel and has
significantly pushed forward the performance of Neural Machine Translation
(NMT). Though intuitively the attentional network can connect distant words via
shorter network paths than RNNs, empirical analysis demonstrates that it still
has difficulty in fully capturing long-distance dependencies (Tang et al.,
2018). Considering that modeling phrases instead of words has significantly
improved the Statistical Machine Translation (SMT) approach through the use of
larger translation blocks ("phrases") and its reordering ability, modeling NMT
at phrase level is an intuitive proposal to help the model capture
long-distance relationships. In this paper, we first propose an attentive
phrase representation generation mechanism which is able to generate phrase
representations from corresponding token representations. In addition, we
incorporate the generated phrase representations into the Transformer
translation model to enhance its ability to capture long-distance
relationships. In our experiments, we obtain significant improvements on the
WMT 14 English-German and English-French tasks on top of the strong Transformer
baseline, which shows the effectiveness of our approach. Our approach helps
Transformer Base models perform at the level of Transformer Big models, and
even significantly better for long sentences, but with substantially fewer
parameters and training steps. The fact that phrase representations help even
in the big setting further supports our conjecture that they make a valuable
contribution to long-distance relations.
Related papers
- Low-resource neural machine translation with morphological modeling [3.3721926640077804]
Morphological modeling in neural machine translation (NMT) is a promising approach to achieving open-vocabulary machine translation.
We propose a framework-solution for modeling complex morphology in low-resource settings.
We evaluate our proposed solution on Kinyarwanda - English translation using public-domain parallel text.
arXiv Detail & Related papers (2024-04-03T01:31:41Z) - Pointer-Generator Networks for Low-Resource Machine Translation: Don't Copy That! [13.120825574589437]
We show that Transformer-based neural machine translation (NMT) is very effective in high-resource settings.
We show that the model does not show greater improvements for closely-related vs. more distant language pairs.
Our discussion of the reasons for this behaviour highlights several general challenges for LR NMT.
arXiv Detail & Related papers (2024-03-16T16:17:47Z) - Probabilistic Transformer: A Probabilistic Dependency Model for
Contextual Word Representation [52.270712965271656]
We propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective.
We find that the graph of our model resembles transformers, with correspondences between dependencies and self-attention.
Experiments show that our model performs competitively to transformers on small to medium sized datasets.
arXiv Detail & Related papers (2023-11-26T06:56:02Z) - Viewing Knowledge Transfer in Multilingual Machine Translation Through a
Representational Lens [15.283483438956264]
We introduce Representational Transfer Potential (RTP), which measures representational similarities between languages.
We show that RTP can measure both positive and negative transfer (interference), and find that RTP is strongly correlated with changes in translation quality.
We develop a novel training scheme, which uses an auxiliary similarity loss that encourages representations to be more invariant across languages.
arXiv Detail & Related papers (2023-05-19T09:36:48Z) - Pre-Training a Graph Recurrent Network for Language Representation [34.4554387894105]
We consider a graph recurrent network for language model pre-training, which builds a graph structure for each sequence with local token-level communications.
We find that our model can generate more diverse outputs with less contextualized feature redundancy than existing attention-based models.
arXiv Detail & Related papers (2022-09-08T14:12:15Z) - Learning to Generalize to More: Continuous Semantic Augmentation for
Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT)
CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z) - Improving Neural Machine Translation by Bidirectional Training [85.64797317290349]
We present a simple and effective pretraining strategy -- bidirectional training (BiT) for neural machine translation.
Specifically, we bidirectionally update the model parameters at the early stage and then tune the model normally.
Experimental results show that BiT pushes the SOTA neural machine translation performance across 15 translation tasks on 8 language pairs significantly higher.
arXiv Detail & Related papers (2021-09-16T07:58:33Z) - Enriching Non-Autoregressive Transformer with Syntactic and
SemanticStructures for Neural Machine Translation [54.864148836486166]
We propose to incorporate the explicit syntactic and semantic structures of languages into a non-autoregressive Transformer.
Our model achieves a significantly faster speed, as well as keeps the translation quality when compared with several state-of-the-art non-autoregressive models.
arXiv Detail & Related papers (2021-01-22T04:12:17Z) - Explicit Reordering for Neural Machine Translation [50.70683739103066]
In Transformer-based neural machine translation (NMT), the positional encoding mechanism helps the self-attention networks to learn the source representation with order dependency.
We propose a novel reordering method to explicitly model this reordering information for the Transformer-based NMT.
The empirical results on the WMT14 English-to-German, WAT ASPEC Japanese-to-English, and WMT17 Chinese-to-English translation tasks show the effectiveness of the proposed approach.
arXiv Detail & Related papers (2020-04-08T05:28:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.