Examining Scaling and Transfer of Language Model Architectures for
Machine Translation
- URL: http://arxiv.org/abs/2202.00528v2
- Date: Wed, 2 Feb 2022 10:48:56 GMT
- Title: Examining Scaling and Transfer of Language Model Architectures for
Machine Translation
- Authors: Biao Zhang, Behrooz Ghorbani, Ankur Bapna, Yong Cheng, Xavier Garcia,
Jonathan Shen, Orhan Firat
- Abstract summary: Language models (LMs) process sequences in a single stack of layers, and encoder-decoder models (EncDec) utilize separate layer stacks for input and output processing.
In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs.
- Score: 51.69212730675345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural language understanding and generation models follow one of the two
dominant architectural paradigms: language models (LMs) that process
concatenated sequences in a single stack of layers, and encoder-decoder models
(EncDec) that utilize separate layer stacks for input and output processing. In
machine translation, EncDec has long been the favoured approach, but with few
studies investigating the performance of LMs. In this work, we thoroughly
examine the role of several architectural design choices on the performance of
LMs on bilingual, (massively) multilingual and zero-shot translation tasks,
under systematic variations of data conditions and model sizes. Our results
show that: (i) Different LMs have different scaling properties, where
architectural differences often have a significant impact on model performance
at small scales, but the performance gap narrows as the number of parameters
increases, (ii) Several design choices, including causal masking and
language-modeling objectives for the source sequence, have detrimental effects
on translation quality, and (iii) When paired with full-visible masking for
source sequences, LMs could perform on par with EncDec on supervised bilingual
and multilingual translation tasks, and improve greatly on zero-shot directions
by facilitating the reduction of off-target translations.
Related papers
- LayAlign: Enhancing Multilingual Reasoning in Large Language Models via Layer-Wise Adaptive Fusion and Alignment Strategy [33.85811169010525]
Large language models (LLMs) exhibit suboptimal performance on low-resource languages.
Recent approaches have leveraged multilingual encoders alongside LLMs by introducing trainable parameters connecting the two models.
We propose aname, a framework that integrates representations from all encoder layers.
arXiv Detail & Related papers (2025-02-17T03:45:03Z) - The Impact of Model Scaling on Seen and Unseen Language Performance [2.012425476229879]
We study the performance and scaling behavior of multilingual Large Language Models across 204 languages.
Our findings show significant differences in scaling behavior between zero-shot and two-shot scenarios.
In two-shot settings, larger models show clear linear improvements in multilingual text classification.
arXiv Detail & Related papers (2025-01-10T00:10:21Z) - Paraphrase-Aligned Machine Translation [7.258916315600866]
Large Language Models (LLMs) have demonstrated significant capabilities in machine translation.
We propose ParaAlign Translator, a method that fine-tunes LLMs to paraphrase sentences, aligning their structures with those of the target language systems.
Experimental results demonstrate that the proposed method enhances the LLaMA-3-8B model's performance in both resource-rich and low-resource scenarios.
arXiv Detail & Related papers (2024-12-08T12:17:26Z) - LLM-based Translation Inference with Iterative Bilingual Understanding [52.46978502902928]
We propose a novel Iterative Bilingual Understanding Translation method based on the cross-lingual capabilities of large language models (LLMs)
The cross-lingual capability of LLMs enables the generation of contextual understanding for both the source and target languages separately.
The proposed IBUT outperforms several strong comparison methods.
arXiv Detail & Related papers (2024-10-16T13:21:46Z) - ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets [106.7760874400261]
This paper presents ML-SUPERB2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models.
We find performance improvements over the setup of ML-SUPERB, but performance depends on the downstream model design.
Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches.
arXiv Detail & Related papers (2024-06-12T21:01:26Z) - Understanding the role of FFNs in driving multilingual behaviour in LLMs [0.0]
In this paper, we conduct an in-depth analysis of the multilingual capabilities of a family of Large Language Models.
We introduce novel metrics to probe the model's multilingual behaviour at different layers and shed light on the impact of architectural choices on multilingual processing.
arXiv Detail & Related papers (2024-04-22T03:47:00Z) - Contextual Code Switching for Machine Translation using Language Models [1.4866655830571935]
Large language models (LLMs) have exerted a considerable impact on diverse language-related tasks in recent years.
We present an extensive study on the code switching task specifically for the machine translation task comparing multiple LLMs.
Our results indicate that despite the LLMs having promising results in the certain tasks, the models with relatively lesser complexity outperform the multilingual large language models in the machine translation task.
arXiv Detail & Related papers (2023-12-20T16:40:33Z) - Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models.
We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks.
OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z) - Exploring Dimensionality Reduction Techniques in Multilingual
Transformers [64.78260098263489]
This paper gives a comprehensive account of the impact of dimensional reduction techniques on the performance of state-of-the-art multilingual Siamese Transformers.
It shows that it is possible to achieve an average reduction in the number of dimensions of $91.58% pm 2.59%$ and $54.65% pm 32.20%$, respectively.
arXiv Detail & Related papers (2022-04-18T17:20:55Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.