Related papers: Examining Scaling and Transfer of Language Model Architectures for Machine Translation

Examining Scaling and Transfer of Language Model Architectures for Machine Translation

URL: http://arxiv.org/abs/2202.00528v2
Date: Wed, 2 Feb 2022 10:48:56 GMT
Title: Examining Scaling and Transfer of Language Model Architectures for Machine Translation
Authors: Biao Zhang, Behrooz Ghorbani, Ankur Bapna, Yong Cheng, Xavier Garcia, Jonathan Shen, Orhan Firat
Abstract summary: Language models (LMs) process sequences in a single stack of layers, and encoder-decoder models (EncDec) utilize separate layer stacks for input and output processing. In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs.
Score: 51.69212730675345
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Natural language understanding and generation models follow one of the two dominant architectural paradigms: language models (LMs) that process concatenated sequences in a single stack of layers, and encoder-decoder models (EncDec) that utilize separate layer stacks for input and output processing. In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs. In this work, we thoroughly examine the role of several architectural design choices on the performance of LMs on bilingual, (massively) multilingual and zero-shot translation tasks, under systematic variations of data conditions and model sizes. Our results show that: (i) Different LMs have different scaling properties, where architectural differences often have a significant impact on model performance at small scales, but the performance gap narrows as the number of parameters increases, (ii) Several design choices, including causal masking and language-modeling objectives for the source sequence, have detrimental effects on translation quality, and (iii) When paired with full-visible masking for source sequences, LMs could perform on par with EncDec on supervised bilingual and multilingual translation tasks, and improve greatly on zero-shot directions by facilitating the reduction of off-target translations.

Related papers

The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs [54.59207567677249]
Large language models (LLMs) still struggle across tasks outside of high-resource languages.<n>In this work, we investigate cross-lingual transfer to lower-resource languages where task-specific post-training data is scarce.
arXiv Detail & Related papers (2025-05-23T20:28:31Z)
LayAlign: Enhancing Multilingual Reasoning in Large Language Models via Layer-Wise Adaptive Fusion and Alignment Strategy [33.85811169010525]
Large language models (LLMs) exhibit suboptimal performance on low-resource languages. Recent approaches have leveraged multilingual encoders alongside LLMs by introducing trainable parameters connecting the two models. We propose aname, a framework that integrates representations from all encoder layers.
arXiv Detail & Related papers (2025-02-17T03:45:03Z)
The Impact of Model Scaling on Seen and Unseen Language Performance [2.012425476229879]
We study the performance and scaling behavior of multilingual Large Language Models across 204 languages. Our findings show significant differences in scaling behavior between zero-shot and two-shot scenarios. In two-shot settings, larger models show clear linear improvements in multilingual text classification.
arXiv Detail & Related papers (2025-01-10T00:10:21Z)
Paraphrase-Aligned Machine Translation [7.258916315600866]
Large Language Models (LLMs) have demonstrated significant capabilities in machine translation. We propose ParaAlign Translator, a method that fine-tunes LLMs to paraphrase sentences, aligning their structures with those of the target language systems. Experimental results demonstrate that the proposed method enhances the LLaMA-3-8B model's performance in both resource-rich and low-resource scenarios.
arXiv Detail & Related papers (2024-12-08T12:17:26Z)
LLM-based Translation Inference with Iterative Bilingual Understanding [52.46978502902928]
We propose a novel Iterative Bilingual Understanding Translation method based on the cross-lingual capabilities of large language models (LLMs) The cross-lingual capability of LLMs enables the generation of contextual understanding for both the source and target languages separately. The proposed IBUT outperforms several strong comparison methods.
arXiv Detail & Related papers (2024-10-16T13:21:46Z)
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets [106.7760874400261]
This paper presents ML-SUPERB2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models. We find performance improvements over the setup of ML-SUPERB, but performance depends on the downstream model design. Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches.
arXiv Detail & Related papers (2024-06-12T21:01:26Z)
Relay Decoding: Concatenating Large Language Models for Machine Translation [21.367605327742027]
We propose an innovative approach called RD (Relay Decoding), which entails concatenating two distinct large models that individually support the source and target languages. By incorporating a simple mapping layer to facilitate the connection between these two models and utilizing a limited amount of parallel data for training, we successfully achieve superior results in the machine translation task.
arXiv Detail & Related papers (2024-05-05T13:42:25Z)
Understanding the role of FFNs in driving multilingual behaviour in LLMs [0.0]
In this paper, we conduct an in-depth analysis of the multilingual capabilities of a family of Large Language Models. We introduce novel metrics to probe the model's multilingual behaviour at different layers and shed light on the impact of architectural choices on multilingual processing.
arXiv Detail & Related papers (2024-04-22T03:47:00Z)
Contextual Code Switching for Machine Translation using Language Models [1.4866655830571935]
Large language models (LLMs) have exerted a considerable impact on diverse language-related tasks in recent years. We present an extensive study on the code switching task specifically for the machine translation task comparing multiple LLMs. Our results indicate that despite the LLMs having promising results in the certain tasks, the models with relatively lesser complexity outperform the multilingual large language models in the machine translation task.
arXiv Detail & Related papers (2023-12-20T16:40:33Z)
Evaluating the Capability of Large-scale Language Models on Chinese Grammatical Error Correction Task [10.597024796304016]
Large-scale language models (LLMs) has shown remarkable capability in various of Natural Language Processing (NLP) tasks. This report explores the how large language models perform on Chinese grammatical error correction tasks.
arXiv Detail & Related papers (2023-07-08T13:10:59Z)
Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models. We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks. OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z)
Exploring Dimensionality Reduction Techniques in Multilingual Transformers [64.78260098263489]
This paper gives a comprehensive account of the impact of dimensional reduction techniques on the performance of state-of-the-art multilingual Siamese Transformers. It shows that it is possible to achieve an average reduction in the number of dimensions of $91.58% pm 2.59%$ and $54.65% pm 32.20%$, respectively.
arXiv Detail & Related papers (2022-04-18T17:20:55Z)
Multilingual Neural Machine Translation:Can Linguistic Hierarchies Help? [29.01386302441015]
Multilingual Neural Machine Translation (MNMT) trains a single NMT model that supports translation between multiple languages. The performance of an MNMT model is highly dependent on the type of languages used in training, as transferring knowledge from a diverse set of languages degrades the translation performance due to negative transfer. We propose a Hierarchical Knowledge Distillation (HKD) approach for MNMT which capitalises on language groups generated according to typological features and phylogeny of languages to overcome the issue of negative transfer.
arXiv Detail & Related papers (2021-10-15T02:31:48Z)
Improving Multilingual Translation by Representation and Gradient Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level. Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z)
Distributionally Robust Multilingual Machine Translation [94.51866646879337]
We propose a new learning objective for Multilingual neural machine translation (MNMT) based on distributionally robust optimization. We show how to practically optimize this objective for large translation corpora using an iterated best response scheme. Our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.
arXiv Detail & Related papers (2021-09-09T03:48:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.