Overcoming Catastrophic Forgetting in Zero-Shot Cross-Lingual Generation
- URL: http://arxiv.org/abs/2205.12647v1
- Date: Wed, 25 May 2022 10:41:34 GMT
- Title: Overcoming Catastrophic Forgetting in Zero-Shot Cross-Lingual Generation
- Authors: Tu Vu, Aditya Barua, Brian Lester, Daniel Cer, Mohit Iyyer, Noah
Constant
- Abstract summary: We investigate the problem of performing a generative task (i.e., summarization) in a target language when labeled data is only available in English.
We find that parameter-efficient adaptation provides gains over standard fine-tuning when transferring between less-related languages.
Our methods can provide further quality gains, suggesting that robust zero-shot cross-lingual generation is within reach.
- Score: 48.80125962015044
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we explore the challenging problem of performing a generative
task (i.e., summarization) in a target language when labeled data is only
available in English. We assume a strict setting with no access to parallel
data or machine translation. Prior work has shown, and we confirm, that
standard transfer learning techniques struggle in this setting, as a generative
multilingual model fine-tuned purely on English catastrophically forgets how to
generate non-English. Given the recent rise of parameter-efficient adaptation
techniques (e.g., prompt tuning), we conduct the first investigation into how
well these methods can overcome catastrophic forgetting to enable zero-shot
cross-lingual generation. We find that parameter-efficient adaptation provides
gains over standard fine-tuning when transferring between less-related
languages, e.g., from English to Thai. However, a significant gap still remains
between these methods and fully-supervised baselines. To improve cross-lingual
transfer further, we explore three approaches: (1) mixing in unlabeled
multilingual data, (2) pre-training prompts on target language data, and (3)
explicitly factoring prompts into recombinable language and task components.
Our methods can provide further quality gains, suggesting that robust zero-shot
cross-lingual generation is within reach.
Related papers
- No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement [59.37775534633868]
We introduce a novel method called language arithmetic, which enables training-free post-processing.
The effectiveness of the proposed solution is demonstrated on three downstream tasks in a MAD-X-based set of cross-lingual schemes.
arXiv Detail & Related papers (2024-04-24T08:52:40Z) - Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks [22.93790760274486]
Zero-shot cross-lingual knowledge transfer enables a multilingual pretrained language model, finetuned on a task in one language, make predictions for this task in other languages.
Previous works notice a frequent problem of generation in a wrong language and propose approaches to address it, usually using mT5 as a backbone model.
In this work we compare various approaches proposed from the literature in unified settings, also including alternative backbone models, namely mBART and NLLB-200.
arXiv Detail & Related papers (2024-02-19T16:43:57Z) - Self-Augmentation Improves Zero-Shot Cross-Lingual Transfer [92.80671770992572]
Cross-lingual transfer is a central task in multilingual NLP.
Earlier efforts on this task use parallel corpora, bilingual dictionaries, or other annotated alignment data.
We propose a simple yet effective method, SALT, to improve the zero-shot cross-lingual transfer.
arXiv Detail & Related papers (2023-09-19T19:30:56Z) - Measuring Catastrophic Forgetting in Cross-Lingual Transfer Paradigms: Exploring Tuning Strategies [4.118037156777793]
Cross-lingual transfer is a promising technique to solve tasks in less-resourced languages.
We compare two fine-tuning approaches combined with zero-shot and full-shot learning approaches for large language models.
arXiv Detail & Related papers (2023-09-12T09:37:08Z) - Why Does Zero-Shot Cross-Lingual Generation Fail? An Explanation and a
Solution [0.9085116579988537]
We show that the fine-tuning process learns language invariant representations, which is beneficial for classification tasks but harmful for generation tasks.
Experiments on three semantically diverse generation tasks show that our method reduces the accidental translation problem by 68% and improves the ROUGE-L score by 1.5 on average.
arXiv Detail & Related papers (2023-05-27T02:04:19Z) - A Simple and Effective Method to Improve Zero-Shot Cross-Lingual
Transfer Learning [6.329304732560936]
Existing zero-shot cross-lingual transfer methods rely on parallel corpora or bilingual dictionaries.
We propose Embedding-Push, Attention-Pull, and Robust targets to transfer English embeddings to virtual multilingual embeddings without semantic loss.
arXiv Detail & Related papers (2022-10-18T15:36:53Z) - ZmBART: An Unsupervised Cross-lingual Transfer Framework for Language
Generation [4.874780144224057]
Cross-lingual transfer for natural language generation is relatively understudied.
We consider four NLG tasks (text summarization, question generation, news headline generation, and distractor generation) and three syntactically diverse languages.
We propose an unsupervised cross-lingual language generation framework (called ZmBART) that does not use any parallel or pseudo-parallel/back-translated data.
arXiv Detail & Related papers (2021-06-03T05:08:01Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - Unsupervised Cross-lingual Adaptation for Sequence Tagging and Beyond [58.80417796087894]
Cross-lingual adaptation with multilingual pre-trained language models (mPTLMs) mainly consists of two lines of works: zero-shot approach and translation-based approach.
We propose a novel framework to consolidate the zero-shot approach and the translation-based approach for better adaptation performance.
arXiv Detail & Related papers (2020-10-23T13:47:01Z) - Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models
via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results.
We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks.
Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.