The Effect of Language Diversity When Fine-Tuning Large Language Models for Translation
- URL: http://arxiv.org/abs/2505.13090v1
- Date: Mon, 19 May 2025 13:24:01 GMT
- Title: The Effect of Language Diversity When Fine-Tuning Large Language Models for Translation
- Authors: David Stap, Christof Monz,
- Abstract summary: We find that expanding language diversity during fine-tuning improves translation quality for both unsupervised and -- surprisingly -- supervised pairs.<n>We show that increased language diversity creates more language-agnostic representations.
- Score: 5.108635348039592
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prior research diverges on language diversity in LLM fine-tuning: Some studies report benefits while others find no advantages. Through controlled fine-tuning experiments across 132 translation directions, we systematically resolve these disparities. We find that expanding language diversity during fine-tuning improves translation quality for both unsupervised and -- surprisingly -- supervised pairs, despite less diverse models being fine-tuned exclusively on these supervised pairs. However, benefits plateau or decrease beyond a certain diversity threshold. We show that increased language diversity creates more language-agnostic representations. These representational adaptations help explain the improved performance in models fine-tuned with greater diversity.
Related papers
- Multilinguality Does not Make Sense: Investigating Factors Behind Zero-Shot Transfer in Sense-Aware Tasks [1.571499916304475]
Cross-lingual transfer allows models to perform tasks in languages unseen during training.<n>We show that multilingual training is neither necessary nor inherently beneficial for effective transfer.
arXiv Detail & Related papers (2025-05-30T17:36:20Z) - When Less Language is More: Language-Reasoning Disentanglement Makes LLMs Better Multilingual Reasoners [111.50503126693444]
We show that language-specific ablation consistently boosts multilingual reasoning performance.<n>Compared to post-training, our training-free ablation achieves comparable or superior results with minimal computational overhead.
arXiv Detail & Related papers (2025-05-21T08:35:05Z) - ShifCon: Enhancing Non-Dominant Language Capabilities with a Shift-based Contrastive Framework [78.07201802874529]
ShifCon is a Shift-based Contrastive framework that aligns the internal forward process of other languages toward that of the dominant one.<n>It shifts the representations of non-dominant languages into the dominant language subspace, allowing them to access relatively rich information encoded in the model parameters.<n>Experiments demonstrate that our ShifCon framework significantly enhances the performance of non-dominant languages.
arXiv Detail & Related papers (2024-10-25T10:28:59Z) - LLM-based Translation Inference with Iterative Bilingual Understanding [52.46978502902928]
We propose a novel Iterative Bilingual Understanding Translation method based on the cross-lingual capabilities of large language models (LLMs)<n>The cross-lingual capability of LLMs enables the generation of contextual understanding for both the source and target languages separately.<n>The proposed IBUT outperforms several strong comparison methods.
arXiv Detail & Related papers (2024-10-16T13:21:46Z) - Quantifying the Gaps Between Translation and Native Perception in Training for Multimodal, Multilingual Retrieval [28.589035749529955]
We empirically show performance gaps between training on captions that come from native German perception and captions that have been either machine-translated or human-translated from English into German.
While we achieve mean recall improvements (+1.3), gaps still remain, indicating an open area of future work for the community.
arXiv Detail & Related papers (2024-10-02T20:47:53Z) - Disentangling the Roles of Target-Side Transfer and Regularization in
Multilingual Machine Translation [9.838281446902268]
We conduct a large-scale study that varies the auxiliary target side languages along two dimensions.
We show that linguistically similar target languages exhibit strong ability to transfer positive knowledge.
With an increasing size of similar target languages, the positive transfer is further enhanced to benefit the main language pairs.
Meanwhile, distant auxiliary target languages can also unexpectedly benefit main language pairs, even with minimal positive transfer ability.
arXiv Detail & Related papers (2024-02-01T10:55:03Z) - Exploring Diversity in Back Translation for Low-Resource Machine
Translation [85.03257601325183]
Back translation is one of the most widely used methods for improving the performance of neural machine translation systems.
Recent research has sought to enhance the effectiveness of this method by increasing the 'diversity' of the generated translations.
This work puts forward a more nuanced framework for understanding diversity in training data, splitting it into lexical diversity and syntactic diversity.
arXiv Detail & Related papers (2022-06-01T15:21:16Z) - An Isotropy Analysis in the Multilingual BERT Embedding Space [18.490856440975996]
We investigate the representation degeneration problem in multilingual contextual word representations (CWRs) of BERT.
Our results show that increasing the isotropy of multilingual embedding space can significantly improve its representation power and performance.
Our analysis indicates that although the degenerated directions vary in different languages, they encode similar linguistic knowledge, suggesting a shared linguistic space among languages.
arXiv Detail & Related papers (2021-10-09T08:29:49Z) - On the Language-specificity of Multilingual BERT and the Impact of
Fine-tuning [7.493779672689531]
The knowledge acquired by multilingual BERT (mBERT) has two components: a language-specific and a language-neutral one.
This paper analyses the relationship between them, in the context of fine-tuning on two tasks.
arXiv Detail & Related papers (2021-09-14T19:28:31Z) - On Negative Interference in Multilingual Models: Findings and A
Meta-Learning Treatment [59.995385574274785]
We show that, contrary to previous belief, negative interference also impacts low-resource languages.
We present a meta-learning algorithm that obtains better cross-lingual transferability and alleviates negative interference.
arXiv Detail & Related papers (2020-10-06T20:48:58Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.