Enriching the Transformer with Linguistic Factors for Low-Resource
Machine Translation
- URL: http://arxiv.org/abs/2004.08053v2
- Date: Thu, 24 Dec 2020 09:06:18 GMT
- Title: Enriching the Transformer with Linguistic Factors for Low-Resource
Machine Translation
- Authors: Jordi Armengol-Estap\'e, Marta R. Costa-juss\`a, Carlos Escolano
- Abstract summary: This study proposes enhancing the current state-of-the-art neural machine translation architecture, the Transformer.
In particular, our proposed modification, the Factored Transformer, uses linguistic factors that insert additional knowledge into the machine translation system.
We show improvements of 0.8 BLEU over the baseline Transformer in the IWSLT German-to-English task.
- Score: 2.2344764434954256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Introducing factors, that is to say, word features such as linguistic
information referring to the source tokens, is known to improve the results of
neural machine translation systems in certain settings, typically in recurrent
architectures. This study proposes enhancing the current state-of-the-art
neural machine translation architecture, the Transformer, so that it allows to
introduce external knowledge. In particular, our proposed modification, the
Factored Transformer, uses linguistic factors that insert additional knowledge
into the machine translation system. Apart from using different kinds of
features, we study the effect of different architectural configurations.
Specifically, we analyze the performance of combining words and features at the
embedding level or at the encoder level, and we experiment with two different
combination strategies. With the best-found configuration, we show improvements
of 0.8 BLEU over the baseline Transformer in the IWSLT German-to-English task.
Moreover, we experiment with the more challenging FLoRes English-to-Nepali
benchmark, which includes both extremely low-resourced and very distant
languages, and obtain an improvement of 1.2 BLEU.
Related papers
- Heterogeneous Encoders Scaling In The Transformer For Neural Machine
Translation [47.82947878753809]
We investigate the effectiveness of integrating an increasing number of heterogeneous methods.
Based on a simple combination strategy and performance-driven synergy criteria, we designed the Multi-Encoder Transformer.
Results showcased that our approach can improve the quality of the translation across a variety of languages and dataset sizes.
arXiv Detail & Related papers (2023-12-26T03:39:08Z) - A Meta-Learning Perspective on Transformers for Causal Language Modeling [17.293733942245154]
The Transformer architecture has become prominent in developing large causal language models.
We establish a meta-learning view of the Transformer architecture when trained for the causal language modeling task.
Within the inner optimization, we discover and theoretically analyze a special characteristic of the norms of learned token representations within Transformer-based causal language models.
arXiv Detail & Related papers (2023-10-09T17:27:36Z) - Hindi to English: Transformer-Based Neural Machine Translation [0.0]
We have developed a Machine Translation (NMT) system by training the Transformer model to translate texts from Indian Language Hindi to English.
We implemented back-translation to augment the training data and for creating the vocabulary.
This led us to achieve a state-of-the-art BLEU score of 24.53 on the test set of IIT Bombay English-Hindi Corpus.
arXiv Detail & Related papers (2023-09-23T00:00:09Z) - Structural Biases for Improving Transformers on Translation into
Morphologically Rich Languages [120.74406230847904]
TP-Transformer augments the traditional Transformer architecture to include an additional component to represent structure.
The second method imbues structure at the data level by segmenting the data with morphological tokenization.
We find that each of these two approaches allows the network to achieve better performance, but this improvement is dependent on the size of the dataset.
arXiv Detail & Related papers (2022-08-11T22:42:24Z) - XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems
to Improve Language Understanding [73.24847320536813]
This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders.
Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU.
arXiv Detail & Related papers (2022-04-15T03:44:00Z) - Examining Scaling and Transfer of Language Model Architectures for
Machine Translation [51.69212730675345]
Language models (LMs) process sequences in a single stack of layers, and encoder-decoder models (EncDec) utilize separate layer stacks for input and output processing.
In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs.
arXiv Detail & Related papers (2022-02-01T16:20:15Z) - Comparing Feature-Engineering and Feature-Learning Approaches for
Multilingual Translationese Classification [11.364204162881482]
We compare the traditional feature-engineering-based approach to the feature-learning-based one.
We investigate how well the hand-crafted features explain the variance in the neural models' predictions.
arXiv Detail & Related papers (2021-09-15T22:34:48Z) - GroupBERT: Enhanced Transformer Architecture with Efficient Grouped
Structures [57.46093180685175]
We demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture.
We add a convolutional module to complement the self-attention module, decoupling the learning of local and global interactions.
We apply the resulting architecture to language representation learning and demonstrate its superior performance compared to BERT models of different scales.
arXiv Detail & Related papers (2021-06-10T15:41:53Z) - Optimizing Transformer for Low-Resource Neural Machine Translation [4.802292434636455]
Language pairs with limited amounts of parallel data, also known as low-resource languages, remain a challenge for neural machine translation.
Our experiments on different subsets of the IWSLT14 training data show that the effectiveness of Transformer under low-resource conditions is highly dependent on the hyper- parameter settings.
Using an optimized Transformer for low-resource conditions improves the translation quality up to 7.3 BLEU points compared to using the Transformer default settings.
arXiv Detail & Related papers (2020-11-04T13:12:29Z) - Dual-decoder Transformer for Joint Automatic Speech Recognition and
Multilingual Speech Translation [71.54816893482457]
We introduce dual-decoder Transformer, a new model architecture that jointly performs automatic speech recognition (ASR) and multilingual speech translation (ST)
Our models are based on the original Transformer architecture but consist of two decoders, each responsible for one task (ASR or ST)
arXiv Detail & Related papers (2020-11-02T04:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.