Related papers: Self-Translate-Train: A Simple but Strong Baseline for Cross-lingual Transfer of Large Language Models

Self-Translate-Train: A Simple but Strong Baseline for Cross-lingual Transfer of Large Language Models

URL: http://arxiv.org/abs/2407.00454v1
Date: Sat, 29 Jun 2024 14:40:23 GMT
Title: Self-Translate-Train: A Simple but Strong Baseline for Cross-lingual Transfer of Large Language Models
Authors: Ryokan Ri, Shun Kiyono, Sho Takase,
Abstract summary: Cross-lingual transfer is a promising technique for utilizing data in a source language to improve performance in a target language. We propose a simple yet effective method called Self-Translate-Train. It leverages the translation capability of a large language model to generate synthetic training data in the target language and fine-tunes the model with its own generated data.
Score: 31.025371443719404
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cross-lingual transfer is a promising technique for utilizing data in a source language to improve performance in a target language. However, current techniques often require an external translation system or suffer from suboptimal performance due to over-reliance on cross-lingual generalization of multi-lingual pretrained language models. In this study, we propose a simple yet effective method called Self-Translate-Train. It leverages the translation capability of a large language model to generate synthetic training data in the target language and fine-tunes the model with its own generated data. We evaluate the proposed method on a wide range of tasks and show substantial performance gains across several non-English languages.

Related papers

Self-Augmentation Improves Zero-Shot Cross-Lingual Transfer [92.80671770992572]
Cross-lingual transfer is a central task in multilingual NLP. Earlier efforts on this task use parallel corpora, bilingual dictionaries, or other annotated alignment data. We propose a simple yet effective method, SALT, to improve the zero-shot cross-lingual transfer.
arXiv Detail & Related papers (2023-09-19T19:30:56Z)
Soft Language Clustering for Multilingual Model Pre-training [57.18058739931463]
We propose XLM-P, which contextually retrieves prompts as flexible guidance for encoding instances conditionally. Our XLM-P enables (1) lightweight modeling of language-invariant and language-specific knowledge across languages, and (2) easy integration with other multilingual pre-training methods.
arXiv Detail & Related papers (2023-06-13T08:08:08Z)
Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings [10.871587311621974]
We experimentally demonstrate that high capacity multilingual language models applied in a zero-shot setting consistently outperform data-based cross-lingual transfer approaches. A detailed analysis of our results suggests that this might be due to important differences in language use. Our results also indicate that data-based cross-lingual transfer approaches remain a competitive option when high-capacity multilingual language models are not available.
arXiv Detail & Related papers (2022-10-23T05:37:35Z)
A Simple and Effective Method to Improve Zero-Shot Cross-Lingual Transfer Learning [6.329304732560936]
Existing zero-shot cross-lingual transfer methods rely on parallel corpora or bilingual dictionaries. We propose Embedding-Push, Attention-Pull, and Robust targets to transfer English embeddings to virtual multilingual embeddings without semantic loss.
arXiv Detail & Related papers (2022-10-18T15:36:53Z)
Language Contamination Explains the Cross-lingual Capabilities of English Pretrained Models [79.38278330678965]
We find that common English pretraining corpora contain significant amounts of non-English text. This leads to hundreds of millions of foreign language tokens in large-scale datasets. We then demonstrate that even these small percentages of non-English data facilitate cross-lingual transfer for models trained on them.
arXiv Detail & Related papers (2022-04-17T23:56:54Z)
Cross-lingual Transferring of Pre-trained Contextualized Language Models [73.97131976850424]
We propose a novel cross-lingual model transferring framework for PrLMs: TreLM. To handle the symbol order and sequence length differences between languages, we propose an intermediate TRILayer" structure. We show the proposed framework significantly outperforms language models trained from scratch with limited data in both performance and efficiency.
arXiv Detail & Related papers (2021-07-27T06:51:13Z)
MergeDistill: Merging Pre-trained Language Models using Distillation [5.396915402673246]
We propose MergeDistill, a framework to merge pre-trained LMs in a way that can best leverage their assets with minimal dependencies. We demonstrate the applicability of our framework in a practical setting by leveraging pre-existing teacher LMs and training student LMs that perform competitively with or even outperform teacher LMs trained on several orders of magnitude more data and with a fixed model capacity.
arXiv Detail & Related papers (2021-06-05T08:22:05Z)
Bilingual Alignment Pre-training for Zero-shot Cross-lingual Transfer [33.680292990007366]
In this paper, we aim to improve the zero-shot cross-lingual transfer performance by aligning the embeddings better. We propose a pre-training task named Alignment Language Model (AlignLM) which uses the statistical alignment information as the prior knowledge to guide bilingual word prediction. The results show AlignLM can improve the zero-shot performance significantly on MLQA and XNLI datasets.
arXiv Detail & Related papers (2021-06-03T10:18:43Z)
Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Translation [127.81351683335143]
Cross-lingual pretraining requires models to align the lexical- and high-level representations of the two languages. Previous research has shown that this is because the representations are not sufficiently aligned. In this paper, we enhance the bilingual masked language model pretraining with lexical-level information by using type-level cross-lingual subword embeddings.
arXiv Detail & Related papers (2021-03-18T21:17:58Z)
Cross-lingual Spoken Language Understanding with Regularized Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource. Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.