Related papers: Enriching the Transformer with Linguistic Factors for Low-Resource Machine Translation

Enriching the Transformer with Linguistic Factors for Low-Resource Machine Translation

URL: http://arxiv.org/abs/2004.08053v2
Date: Thu, 24 Dec 2020 09:06:18 GMT
Title: Enriching the Transformer with Linguistic Factors for Low-Resource Machine Translation
Authors: Jordi Armengol-Estap\'e, Marta R. Costa-juss\`a, Carlos Escolano
Abstract summary: This study proposes enhancing the current state-of-the-art neural machine translation architecture, the Transformer. In particular, our proposed modification, the Factored Transformer, uses linguistic factors that insert additional knowledge into the machine translation system. We show improvements of 0.8 BLEU over the baseline Transformer in the IWSLT German-to-English task.
Score: 2.2344764434954256
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Introducing factors, that is to say, word features such as linguistic information referring to the source tokens, is known to improve the results of neural machine translation systems in certain settings, typically in recurrent architectures. This study proposes enhancing the current state-of-the-art neural machine translation architecture, the Transformer, so that it allows to introduce external knowledge. In particular, our proposed modification, the Factored Transformer, uses linguistic factors that insert additional knowledge into the machine translation system. Apart from using different kinds of features, we study the effect of different architectural configurations. Specifically, we analyze the performance of combining words and features at the embedding level or at the encoder level, and we experiment with two different combination strategies. With the best-found configuration, we show improvements of 0.8 BLEU over the baseline Transformer in the IWSLT German-to-English task. Moreover, we experiment with the more challenging FLoRes English-to-Nepali benchmark, which includes both extremely low-resourced and very distant languages, and obtain an improvement of 1.2 BLEU.

Related papers

Advancements in Natural Language Processing: Exploring Transformer-Based Architectures for Text Understanding [10.484788943232674]
This paper explores the advancements in transformer models, such as BERT and GPT, focusing on their superior performance in text understanding tasks. The results demonstrate state-of-the-art performance on benchmarks like GLUE and SQuAD, with F1 scores exceeding 90%, though challenges such as high computational costs persist.
arXiv Detail & Related papers (2025-03-26T04:45:33Z)
LLM-based Translation Inference with Iterative Bilingual Understanding [52.46978502902928]
We propose a novel Iterative Bilingual Understanding Translation method based on the cross-lingual capabilities of large language models (LLMs) The cross-lingual capability of LLMs enables the generation of contextual understanding for both the source and target languages separately. The proposed IBUT outperforms several strong comparison methods.
arXiv Detail & Related papers (2024-10-16T13:21:46Z)
Heterogeneous Encoders Scaling In The Transformer For Neural Machine Translation [47.82947878753809]
We investigate the effectiveness of integrating an increasing number of heterogeneous methods. Based on a simple combination strategy and performance-driven synergy criteria, we designed the Multi-Encoder Transformer. Results showcased that our approach can improve the quality of the translation across a variety of languages and dataset sizes.
arXiv Detail & Related papers (2023-12-26T03:39:08Z)
A Meta-Learning Perspective on Transformers for Causal Language Modeling [17.293733942245154]
The Transformer architecture has become prominent in developing large causal language models. We establish a meta-learning view of the Transformer architecture when trained for the causal language modeling task. Within the inner optimization, we discover and theoretically analyze a special characteristic of the norms of learned token representations within Transformer-based causal language models.
arXiv Detail & Related papers (2023-10-09T17:27:36Z)
Hindi to English: Transformer-Based Neural Machine Translation [0.0]
We have developed a Machine Translation (NMT) system by training the Transformer model to translate texts from Indian Language Hindi to English. We implemented back-translation to augment the training data and for creating the vocabulary. This led us to achieve a state-of-the-art BLEU score of 24.53 on the test set of IIT Bombay English-Hindi Corpus.
arXiv Detail & Related papers (2023-09-23T00:00:09Z)
Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages [120.74406230847904]
TP-Transformer augments the traditional Transformer architecture to include an additional component to represent structure. The second method imbues structure at the data level by segmenting the data with morphological tokenization. We find that each of these two approaches allows the network to achieve better performance, but this improvement is dependent on the size of the dataset.
arXiv Detail & Related papers (2022-08-11T22:42:24Z)
XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding [73.24847320536813]
This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders. Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU.
arXiv Detail & Related papers (2022-04-15T03:44:00Z)
Examining Scaling and Transfer of Language Model Architectures for Machine Translation [51.69212730675345]
Language models (LMs) process sequences in a single stack of layers, and encoder-decoder models (EncDec) utilize separate layer stacks for input and output processing. In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs.
arXiv Detail & Related papers (2022-02-01T16:20:15Z)
Comparing Feature-Engineering and Feature-Learning Approaches for Multilingual Translationese Classification [11.364204162881482]
We compare the traditional feature-engineering-based approach to the feature-learning-based one. We investigate how well the hand-crafted features explain the variance in the neural models' predictions.
arXiv Detail & Related papers (2021-09-15T22:34:48Z)
GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures [57.46093180685175]
We demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture. We add a convolutional module to complement the self-attention module, decoupling the learning of local and global interactions. We apply the resulting architecture to language representation learning and demonstrate its superior performance compared to BERT models of different scales.
arXiv Detail & Related papers (2021-06-10T15:41:53Z)
Optimizing Transformer for Low-Resource Neural Machine Translation [4.802292434636455]
Language pairs with limited amounts of parallel data, also known as low-resource languages, remain a challenge for neural machine translation. Our experiments on different subsets of the IWSLT14 training data show that the effectiveness of Transformer under low-resource conditions is highly dependent on the hyper- parameter settings. Using an optimized Transformer for low-resource conditions improves the translation quality up to 7.3 BLEU points compared to using the Transformer default settings.
arXiv Detail & Related papers (2020-11-04T13:12:29Z)
Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation [71.54816893482457]
We introduce dual-decoder Transformer, a new model architecture that jointly performs automatic speech recognition (ASR) and multilingual speech translation (ST) Our models are based on the original Transformer architecture but consist of two decoders, each responsible for one task (ASR or ST)
arXiv Detail & Related papers (2020-11-02T04:59:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.