Examining Scaling and Transfer of Language Model Architectures for
Machine Translation
- URL: http://arxiv.org/abs/2202.00528v2
- Date: Wed, 2 Feb 2022 10:48:56 GMT
- Title: Examining Scaling and Transfer of Language Model Architectures for
Machine Translation
- Authors: Biao Zhang, Behrooz Ghorbani, Ankur Bapna, Yong Cheng, Xavier Garcia,
Jonathan Shen, Orhan Firat
- Abstract summary: Language models (LMs) process sequences in a single stack of layers, and encoder-decoder models (EncDec) utilize separate layer stacks for input and output processing.
In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs.
- Score: 51.69212730675345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural language understanding and generation models follow one of the two
dominant architectural paradigms: language models (LMs) that process
concatenated sequences in a single stack of layers, and encoder-decoder models
(EncDec) that utilize separate layer stacks for input and output processing. In
machine translation, EncDec has long been the favoured approach, but with few
studies investigating the performance of LMs. In this work, we thoroughly
examine the role of several architectural design choices on the performance of
LMs on bilingual, (massively) multilingual and zero-shot translation tasks,
under systematic variations of data conditions and model sizes. Our results
show that: (i) Different LMs have different scaling properties, where
architectural differences often have a significant impact on model performance
at small scales, but the performance gap narrows as the number of parameters
increases, (ii) Several design choices, including causal masking and
language-modeling objectives for the source sequence, have detrimental effects
on translation quality, and (iii) When paired with full-visible masking for
source sequences, LMs could perform on par with EncDec on supervised bilingual
and multilingual translation tasks, and improve greatly on zero-shot directions
by facilitating the reduction of off-target translations.
Related papers
- ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets [106.7760874400261]
This paper presents ML-SUPERB2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models.
We find performance improvements over the setup of ML-SUPERB, but performance depends on the downstream model design.
Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches.
arXiv Detail & Related papers (2024-06-12T21:01:26Z) - Relay Decoding: Concatenating Large Language Models for Machine Translation [21.367605327742027]
We propose an innovative approach called RD (Relay Decoding), which entails concatenating two distinct large models that individually support the source and target languages.
By incorporating a simple mapping layer to facilitate the connection between these two models and utilizing a limited amount of parallel data for training, we successfully achieve superior results in the machine translation task.
arXiv Detail & Related papers (2024-05-05T13:42:25Z) - Understanding the role of FFNs in driving multilingual behaviour in LLMs [0.0]
In this paper, we conduct an in-depth analysis of the multilingual capabilities of a family of Large Language Models.
We introduce novel metrics to probe the model's multilingual behaviour at different layers and shed light on the impact of architectural choices on multilingual processing.
arXiv Detail & Related papers (2024-04-22T03:47:00Z) - Contextual Code Switching for Machine Translation using Language Models [1.4866655830571935]
Large language models (LLMs) have exerted a considerable impact on diverse language-related tasks in recent years.
We present an extensive study on the code switching task specifically for the machine translation task comparing multiple LLMs.
Our results indicate that despite the LLMs having promising results in the certain tasks, the models with relatively lesser complexity outperform the multilingual large language models in the machine translation task.
arXiv Detail & Related papers (2023-12-20T16:40:33Z) - Evaluating the Capability of Large-scale Language Models on Chinese
Grammatical Error Correction Task [10.597024796304016]
Large-scale language models (LLMs) has shown remarkable capability in various of Natural Language Processing (NLP) tasks.
This report explores the how large language models perform on Chinese grammatical error correction tasks.
arXiv Detail & Related papers (2023-07-08T13:10:59Z) - Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models.
We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks.
OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z) - Exploring Dimensionality Reduction Techniques in Multilingual
Transformers [64.78260098263489]
This paper gives a comprehensive account of the impact of dimensional reduction techniques on the performance of state-of-the-art multilingual Siamese Transformers.
It shows that it is possible to achieve an average reduction in the number of dimensions of $91.58% pm 2.59%$ and $54.65% pm 32.20%$, respectively.
arXiv Detail & Related papers (2022-04-18T17:20:55Z) - Multilingual Neural Machine Translation:Can Linguistic Hierarchies Help? [29.01386302441015]
Multilingual Neural Machine Translation (MNMT) trains a single NMT model that supports translation between multiple languages.
The performance of an MNMT model is highly dependent on the type of languages used in training, as transferring knowledge from a diverse set of languages degrades the translation performance due to negative transfer.
We propose a Hierarchical Knowledge Distillation (HKD) approach for MNMT which capitalises on language groups generated according to typological features and phylogeny of languages to overcome the issue of negative transfer.
arXiv Detail & Related papers (2021-10-15T02:31:48Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization.
We show how to practically optimize this objective for large translation corpora using an iterated best response scheme.
Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.