Related papers: Heterogeneous Encoders Scaling In The Transformer For Neural Machine Translation

Heterogeneous Encoders Scaling In The Transformer For Neural Machine Translation

URL: http://arxiv.org/abs/2312.15872v1
Date: Tue, 26 Dec 2023 03:39:08 GMT
Title: Heterogeneous Encoders Scaling In The Transformer For Neural Machine Translation
Authors: Jia Cheng Hu, Roberto Cavicchioli, Giulia Berardinelli, Alessandro Capotondi
Abstract summary: We investigate the effectiveness of integrating an increasing number of heterogeneous methods. Based on a simple combination strategy and performance-driven synergy criteria, we designed the Multi-Encoder Transformer. Results showcased that our approach can improve the quality of the translation across a variety of languages and dataset sizes.
Score: 47.82947878753809
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Although the Transformer is currently the best-performing architecture in the homogeneous configuration (self-attention only) in Neural Machine Translation, many State-of-the-Art models in Natural Language Processing are made of a combination of different Deep Learning approaches. However, these models often focus on combining a couple of techniques only and it is unclear why some methods are chosen over others. In this work, we investigate the effectiveness of integrating an increasing number of heterogeneous methods. Based on a simple combination strategy and performance-driven synergy criteria, we designed the Multi-Encoder Transformer, which consists of up to five diverse encoders. Results showcased that our approach can improve the quality of the translation across a variety of languages and dataset sizes and it is particularly effective in low-resource languages where we observed a maximum increase of 7.16 BLEU compared to the single-encoder model.

Related papers

Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning [73.73967342609603]
We introduce a predictor-corrector learning framework to minimize truncation errors. We also propose an exponential moving average-based coefficient learning method to strengthen our higher-order predictor. Our model surpasses a robust 3.8B DeepNet by an average of 2.9 SacreBLEU, using only 1/3 parameters.
arXiv Detail & Related papers (2024-11-05T12:26:25Z)
Low-resource neural machine translation with morphological modeling [3.3721926640077804]
Morphological modeling in neural machine translation (NMT) is a promising approach to achieving open-vocabulary machine translation. We propose a framework-solution for modeling complex morphology in low-resource settings. We evaluate our proposed solution on Kinyarwanda - English translation using public-domain parallel text.
arXiv Detail & Related papers (2024-04-03T01:31:41Z)
Extrapolating Multilingual Understanding Models as Multilingual Generators [82.1355802012414]
This paper explores methods to empower multilingual understanding models the generation abilities to get a unified model. We propose a textbfSemantic-textbfGuided textbfAlignment-then-Denoising (SGA) approach to adapt an encoder to a multilingual generator with a small number of new parameters.
arXiv Detail & Related papers (2023-05-22T15:33:21Z)
Examining Scaling and Transfer of Language Model Architectures for Machine Translation [51.69212730675345]
Language models (LMs) process sequences in a single stack of layers, and encoder-decoder models (EncDec) utilize separate layer stacks for input and output processing. In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs.
arXiv Detail & Related papers (2022-02-01T16:20:15Z)
Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition [30.941564693248512]
We investigate various fusion techniques for the all-attention-based encoder-decoder architecture known as the transformer. We introduce a novel multi-encoder learning method that performs a weighted combination of two encoder-decoder multi-head attention outputs only during training. We achieve state-of-the-art performance for transformer-based models on Wall Street Journal with a significant WER reduction of 19% relative compared to the current benchmark approach.
arXiv Detail & Related papers (2021-03-31T21:07:43Z)
Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation [71.54816893482457]
We introduce dual-decoder Transformer, a new model architecture that jointly performs automatic speech recognition (ASR) and multilingual speech translation (ST) Our models are based on the original Transformer architecture but consist of two decoders, each responsible for one task (ASR or ST)
arXiv Detail & Related papers (2020-11-02T04:59:50Z)
Multi-Unit Transformers for Neural Machine Translation [51.418245676894465]
We propose the Multi-Unit Transformers (MUTE) to promote the expressiveness of the Transformer. Specifically, we use several parallel units and show that modeling with multiple units improves model performance and introduces diversity.
arXiv Detail & Related papers (2020-10-21T03:41:49Z)
Efficient Inference For Neural Machine Translation [3.0338337603465013]
Large Transformer models have achieved state-of-the-art results in neural machine translation. We look for the optimal combination of known techniques to optimize inference speed without sacrificing translation quality.
arXiv Detail & Related papers (2020-10-06T01:21:11Z)
Deep Transformers with Latent Depth [42.33955275626127]
The Transformer model has achieved state-of-the-art performance in many sequence modeling tasks. We present a probabilistic framework to automatically learn which layer(s) to use by learning the posterior distributions of layer selection. We propose a novel method to train one shared Transformer network for multilingual machine translation.
arXiv Detail & Related papers (2020-09-28T07:13:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.