Conditions for Catastrophic Forgetting in Multilingual Translation
- URL: http://arxiv.org/abs/2510.19546v1
- Date: Wed, 22 Oct 2025 12:54:00 GMT
- Title: Conditions for Catastrophic Forgetting in Multilingual Translation
- Authors: Danni Liu, Jan Niehues,
- Abstract summary: We identify the conditions that trigger catastrophic forgetting in multilingual fine-tuning.<n>We show that the relative scale between model and data size is a primary determinant of forgetting.<n>We also show that cross-lingual alignment can mitigate forgetting while also facilitating positive transfer to unseen target languages.
- Score: 24.10629800866219
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-tuning multilingual foundation models on specific languages often induces catastrophic forgetting, degrading performance on languages unseen in fine-tuning. While this phenomenon is widely-documented, the literature presents fragmented results about when forgetting occurs. To address this ambiguity, we conduct a systematic empirical study using machine translation as a testbed to identify the conditions that trigger catastrophic forgetting in multilingual fine-tuning. Through controlled experiments across different model architectures, data scales, and fine-tuning approaches, we reveal that the relative scale between model and data size is a primary determinant of forgetting. Moreover, we demonstrate that a model's instruction-following ability is more critical for retaining multilingual knowledge than its architecture. Contrary to assumptions, parameter-efficient fine-tuning offers no clear advantage over full fine-tuning in mitigating forgetting. Lastly, we show that cross-lingual alignment can mitigate forgetting while also facilitating positive transfer to unseen target languages.
Related papers
- Parallel Universes, Parallel Languages: A Comprehensive Study on LLM-based Multilingual Counterfactual Example Generation [49.2073409243885]
Large language models (LLMs) excel at generating English counterfactuals and demonstrate multilingual proficiency.<n>We conduct automatic evaluations on both directly generated counterfactuals in the target languages and those derived via English translation across six languages.<n>We identify and categorize four main types of errors that consistently appear in the generated counterfactuals across languages.
arXiv Detail & Related papers (2026-01-01T08:53:49Z) - Exploring Performance Variations in Finetuned Translators of Ultra-Low Resource Languages: Do Linguistic Differences Matter? [0.0]
Finetuning pre-trained language models with small amounts of data is a commonly-used method to create translators for ultra-low resource languages.<n>Previous works have reported substantially different performances with translators created using similar methodology and data.
arXiv Detail & Related papers (2025-11-27T14:15:14Z) - Quantifying Language Disparities in Multilingual Large Language Models [31.198046729180266]
Results reported in large-scale multilingual evaluations are often fragmented and confounded by factors such as target languages, differences in experimental setups, and model choices.<n>We propose a framework that disentangles these confounding variables and introduces three interpretable metrics--the performance realisation ratio, its coefficient of variation, and language potential.
arXiv Detail & Related papers (2025-08-23T23:25:38Z) - Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models? [17.011882550422452]
It is unknown whether the nature of the instruction data has an impact on the model output.
It is questionable whether translated test sets can capture such nuances.
We show that native or generation benchmarks reveal a notable difference between native and translated instruction data.
arXiv Detail & Related papers (2024-06-18T17:43:47Z) - The Power of Question Translation Training in Multilingual Reasoning: Broadened Scope and Deepened Insights [108.40766216456413]
We propose a question alignment framework to bridge the gap between large language models' English and non-English performance.
Experiment results show it can boost multilingual performance across diverse reasoning scenarios, model families, and sizes.
We analyze representation space, generated response and data scales, and reveal how question translation training strengthens language alignment within LLMs.
arXiv Detail & Related papers (2024-05-02T14:49:50Z) - On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based
Multilingual Model [49.81429697921861]
We study the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models.
We show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning.
arXiv Detail & Related papers (2023-11-14T00:43:33Z) - Is Prompt-Based Finetuning Always Better than Vanilla Finetuning?
Insights from Cross-Lingual Language Understanding [0.30586855806896046]
We propose the ProFiT pipeline to investigate the cross-lingual capabilities of Prompt-based Finetuning.
Our results reveal the effectiveness and versatility of prompt-based finetuning in cross-lingual language understanding.
arXiv Detail & Related papers (2023-07-15T20:33:33Z) - A Multi-dimensional Evaluation of Tokenizer-free Multilingual Pretrained
Models [87.7086269902562]
We show that subword-based models might still be the most practical choice in many settings.
We encourage future work in tokenizer-free methods to consider these factors when designing and evaluating new models.
arXiv Detail & Related papers (2022-10-13T15:47:09Z) - Adaptive Sparse Transformer for Multilingual Translation [18.017674093519332]
A known challenge of multilingual models is the negative language interference.
We propose an adaptive and sparse architecture for multilingual modeling.
Our model outperforms strong baselines in terms of translation quality without increasing the inference cost.
arXiv Detail & Related papers (2021-04-15T10:31:07Z) - Comparison of Interactive Knowledge Base Spelling Correction Models for
Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict.
This work shows a comparison of a neural model and character language models with varying amounts on target language data.
Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z) - On Negative Interference in Multilingual Models: Findings and A
Meta-Learning Treatment [59.995385574274785]
We show that, contrary to previous belief, negative interference also impacts low-resource languages.
We present a meta-learning algorithm that obtains better cross-lingual transferability and alleviates negative interference.
arXiv Detail & Related papers (2020-10-06T20:48:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.