Related papers: Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks

Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks

URL: http://arxiv.org/abs/2402.12279v2
Date: Mon, 22 Apr 2024 17:32:00 GMT
Title: Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks
Authors: Nadezhda Chirkova, Vassilina Nikoulina,
Abstract summary: Zero-shot cross-lingual knowledge transfer enables a multilingual pretrained language model, finetuned on a task in one language, make predictions for this task in other languages. Previous works notice a frequent problem of generation in a wrong language and propose approaches to address it, usually using mT5 as a backbone model. In this work we compare various approaches proposed from the literature in unified settings, also including alternative backbone models, namely mBART and NLLB-200.
Score: 22.93790760274486
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Zero-shot cross-lingual knowledge transfer enables a multilingual pretrained language model, finetuned on a task in one language, make predictions for this task in other languages. While being broadly studied for natural language understanding tasks, the described setting is understudied for generation. Previous works notice a frequent problem of generation in a wrong language and propose approaches to address it, usually using mT5 as a backbone model. In this work we compare various approaches proposed from the literature in unified settings, also including alternative backbone models, namely mBART and NLLB-200. We first underline the importance of tuning learning rate used for finetuning, which helps to substantially alleviate the problem of generation in the wrong language. Then, we show that with careful learning rate tuning, the simple full finetuning of the model acts as a very strong baseline and alternative approaches bring only marginal improvements. Finally, we find that mBART performs similarly to mT5 of the same size, and NLLB-200 can be competitive in some cases. Our final zero-shot models reach the performance of the approach based on data translation which is usually considered as an upper baseline for zero-shot cross-lingual transfer in generation.

Related papers

Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages [16.671158083515373]
We develop a fluent preference-aligned language model without instruction-tuning data in the target language.<n>Our approach uses an on-policy training method, which we compare with two common approaches.<n>We conduct a case study on Norwegian Bokml and evaluate fluency through native-speaker assessments.
arXiv Detail & Related papers (2025-12-09T16:31:48Z)
Empirical study of pretrained multilingual language models for zero-shot cross-lingual knowledge transfer in generation [22.962667039293976]
Cross-lingual knowledge transfer enables the multilingual pretrained language model (mPLM) to make predictions in other languages. Previous works notice a frequent problem of generation in a wrong language and propose approaches to address it, usually using mT5 as a backbone model. In this work, we test alternative mPLMs, such as mBART and NLLB-200, considering full finetuning and parameter-efficient finetuning with adapters.
arXiv Detail & Related papers (2023-10-15T18:58:53Z)
Why Does Zero-Shot Cross-Lingual Generation Fail? An Explanation and a Solution [0.9085116579988537]
We show that the fine-tuning process learns language invariant representations, which is beneficial for classification tasks but harmful for generation tasks. Experiments on three semantically diverse generation tasks show that our method reduces the accidental translation problem by 68% and improves the ROUGE-L score by 1.5 on average.
arXiv Detail & Related papers (2023-05-27T02:04:19Z)
BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer [81.5984433881309]
We introduce BUFFET, which unifies 15 diverse tasks across 54 languages in a sequence-to-sequence format. BUFFET is designed to establish a rigorous and equitable evaluation framework for few-shot cross-lingual transfer. Our findings reveal significant room for improvement in few-shot in-context cross-lingual transfer.
arXiv Detail & Related papers (2023-05-24T08:06:33Z)
Overcoming Catastrophic Forgetting in Zero-Shot Cross-Lingual Generation [48.80125962015044]
We investigate the problem of performing a generative task (i.e., summarization) in a target language when labeled data is only available in English. We find that parameter-efficient adaptation provides gains over standard fine-tuning when transferring between less-related languages. Our methods can provide further quality gains, suggesting that robust zero-shot cross-lingual generation is within reach.
arXiv Detail & Related papers (2022-05-25T10:41:34Z)
IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark. IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages. We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z)
ZmBART: An Unsupervised Cross-lingual Transfer Framework for Language Generation [4.874780144224057]
Cross-lingual transfer for natural language generation is relatively understudied. We consider four NLG tasks (text summarization, question generation, news headline generation, and distractor generation) and three syntactically diverse languages. We propose an unsupervised cross-lingual language generation framework (called ZmBART) that does not use any parallel or pseudo-parallel/back-translated data.
arXiv Detail & Related papers (2021-06-03T05:08:01Z)
Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting. Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking. We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results. We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks. Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z)
Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations. We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics. We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.