Why Does Zero-Shot Cross-Lingual Generation Fail? An Explanation and a
Solution
- URL: http://arxiv.org/abs/2305.17325v1
- Date: Sat, 27 May 2023 02:04:19 GMT
- Title: Why Does Zero-Shot Cross-Lingual Generation Fail? An Explanation and a
Solution
- Authors: Tianjian Li and Kenton Murray
- Abstract summary: We show that the fine-tuning process learns language invariant representations, which is beneficial for classification tasks but harmful for generation tasks.
Experiments on three semantically diverse generation tasks show that our method reduces the accidental translation problem by 68% and improves the ROUGE-L score by 1.5 on average.
- Score: 0.9085116579988537
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Zero-shot cross-lingual transfer is when a multilingual model is trained to
perform a task in one language and then is applied to another language.
Although the zero-shot cross-lingual transfer approach has achieved success in
various classification tasks, its performance on natural language generation
tasks falls short in quality and sometimes outputs an incorrect language. In
our study, we show that the fine-tuning process learns language invariant
representations, which is beneficial for classification tasks but harmful for
generation tasks. Motivated by this, we propose a simple method to regularize
the model from learning language invariant representations and a method to
select model checkpoints without a development set in the target language, both
resulting in better generation quality. Experiments on three semantically
diverse generation tasks show that our method reduces the accidental
translation problem by 68% and improves the ROUGE-L score by 1.5 on average.
Related papers
- No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement [59.37775534633868]
We introduce a novel method called language arithmetic, which enables training-free post-processing.
The effectiveness of the proposed solution is demonstrated on three downstream tasks in a MAD-X-based set of cross-lingual schemes.
arXiv Detail & Related papers (2024-04-24T08:52:40Z) - Language-Independent Representations Improve Zero-Shot Summarization [18.46817967804773]
Finetuning pretrained models on downstream generation tasks often leads to catastrophic forgetting in zero-shot conditions.
In this work, we focus on summarization and tackle the problem through the lens of language-independent representations.
We first show naively finetuned models are highly language-specific in both output behavior and internal representations, resulting in poor zero-shot performance.
arXiv Detail & Related papers (2024-04-08T17:56:43Z) - Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks [22.93790760274486]
Zero-shot cross-lingual knowledge transfer enables a multilingual pretrained language model, finetuned on a task in one language, make predictions for this task in other languages.
Previous works notice a frequent problem of generation in a wrong language and propose approaches to address it, usually using mT5 as a backbone model.
In this work we compare various approaches proposed from the literature in unified settings, also including alternative backbone models, namely mBART and NLLB-200.
arXiv Detail & Related papers (2024-02-19T16:43:57Z) - Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally.
Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z) - Improving the Cross-Lingual Generalisation in Visual Question Answering [40.86774711775718]
multilingual vision-language pretrained models show poor cross-lingual generalisation when applied to non-English data.
In this work, we explore the poor performance of these models on a zero-shot cross-lingual visual question answering (VQA) task.
We improve cross-lingual transfer with three strategies: (1) we introduce a linguistic prior objective to augment the cross-entropy loss with a similarity-based loss to guide the model during training, (2) we learn a task-specific subnetwork that improves cross-lingual generalisation and reduces variance without model modification, and (3) we augment training examples using synthetic code
arXiv Detail & Related papers (2022-09-07T08:07:43Z) - Zero-shot Cross-lingual Transfer is Under-specified Optimization [49.3779328255767]
We show that any linear-interpolated model between the source language monolingual model and source + target bilingual model has equally low source language generalization error.
We also show that zero-shot solution lies in non-flat region of target language error generalization surface, causing the high variance.
arXiv Detail & Related papers (2022-07-12T16:49:28Z) - Bridging the Gap Between Training and Inference of Bayesian Controllable
Language Models [58.990214815032495]
Large-scale pre-trained language models have achieved great success on natural language generation tasks.
BCLMs have been shown to be efficient in controllable language generation.
We propose a "Gemini Discriminator" for controllable language generation which alleviates the mismatch problem with a small computational cost.
arXiv Detail & Related papers (2022-06-11T12:52:32Z) - Overcoming Catastrophic Forgetting in Zero-Shot Cross-Lingual Generation [48.80125962015044]
We investigate the problem of performing a generative task (i.e., summarization) in a target language when labeled data is only available in English.
We find that parameter-efficient adaptation provides gains over standard fine-tuning when transferring between less-related languages.
Our methods can provide further quality gains, suggesting that robust zero-shot cross-lingual generation is within reach.
arXiv Detail & Related papers (2022-05-25T10:41:34Z) - CrossAligner & Co: Zero-Shot Transfer Methods for Task-Oriented
Cross-lingual Natural Language Understanding [18.14437842819122]
CrossAligner is the principal method of a variety of effective approaches for zero-shot cross-lingual transfer.
We present a quantitative analysis of individual methods as well as their weighted combinations, several of which exceed state-of-the-art (SOTA) scores.
A detailed qualitative error analysis of the best methods shows that our fine-tuned language models can zero-shot transfer the task knowledge better than anticipated.
arXiv Detail & Related papers (2022-03-18T14:18:12Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.