Competence-based Curriculum Learning for Multilingual Machine
Translation
- URL: http://arxiv.org/abs/2109.04002v1
- Date: Thu, 9 Sep 2021 02:52:34 GMT
- Title: Competence-based Curriculum Learning for Multilingual Machine
Translation
- Authors: Mingliang Zhang, Fandong Meng, Yunhai Tong and Jie Zhou
- Abstract summary: Existing multilingual machine translation models face a severe challenge: imbalance.
We propose Competence-based Curriculum Learning for Multilingual Machine Translation.
Our approach has achieved a steady and significant performance gain compared to the previous state-of-the-art approach on the TED talks dataset.
- Score: 28.30800327665549
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Currently, multilingual machine translation is receiving more and more
attention since it brings better performance for low resource languages (LRLs)
and saves more space. However, existing multilingual machine translation models
face a severe challenge: imbalance. As a result, the translation performance of
different languages in multilingual translation models are quite different. We
argue that this imbalance problem stems from the different learning
competencies of different languages. Therefore, we focus on balancing the
learning competencies of different languages and propose Competence-based
Curriculum Learning for Multilingual Machine Translation, named CCL-M.
Specifically, we firstly define two competencies to help schedule the high
resource languages (HRLs) and the low resource languages: 1) Self-evaluated
Competence, evaluating how well the language itself has been learned; and 2)
HRLs-evaluated Competence, evaluating whether an LRL is ready to be learned
according to HRLs' Self-evaluated Competence. Based on the above competencies,
we utilize the proposed CCL-M algorithm to gradually add new languages into the
training set in a curriculum learning manner. Furthermore, we propose a novel
competenceaware dynamic balancing sampling strategy for better selecting
training samples in multilingual training. Experimental results show that our
approach has achieved a steady and significant performance gain compared to the
previous state-of-the-art approach on the TED talks dataset.
Related papers
- Code-Switching Curriculum Learning for Multilingual Transfer in LLMs [43.85646680303273]
Large language models (LLMs) exhibit near human-level performance in various tasks, but their performance drops drastically after a handful of high-resource languages.
Inspired by the human process of second language acquisition, we propose code-switching curriculum learning (CSCL) to enhance cross-lingual transfer for LLMs.
CSCL mimics the stages of human language learning by progressively training models with a curriculum consisting of 1) token-level code-switching, 2) sentence-level code-switching, and 3) monolingual corpora.
arXiv Detail & Related papers (2024-11-04T06:31:26Z) - Optimizing the Training Schedule of Multilingual NMT using Reinforcement Learning [0.3277163122167433]
We propose two algorithms that use reinforcement learning to optimize the training schedule of Multilingual NMT.
On a 8-to-1 translation dataset with LRLs and HRLs, our second method improves BLEU and COMET scores with respect to both random selection of monolingual batches and shuffled multilingual batches.
arXiv Detail & Related papers (2024-10-08T15:20:13Z) - Multilingual Pretraining and Instruction Tuning Improve Cross-Lingual Knowledge Alignment, But Only Shallowly [53.04368883943773]
Two approaches are proposed to address this, i.e., multilingual pretraining and multilingual instruction tuning.
We propose CLiKA to assess the cross-lingual knowledge alignment of LLMs in the Performance, Consistency and Conductivity levels.
Results show that while both multilingual pretraining and instruction tuning are beneficial for cross-lingual knowledge alignment, the training strategy needs to be carefully designed.
arXiv Detail & Related papers (2024-04-06T15:25:06Z) - Analyzing and Adapting Large Language Models for Few-Shot Multilingual
NLU: Are We There Yet? [82.02076369811402]
Supervised fine-tuning (SFT), supervised instruction tuning (SIT) and in-context learning (ICL) are three alternative, de facto standard approaches to few-shot learning.
We present an extensive and systematic comparison of the three approaches, testing them on 6 high- and low-resource languages, three different NLU tasks, and a myriad of language and domain setups.
Our observations show that supervised instruction tuning has the best trade-off between performance and resource requirements.
arXiv Detail & Related papers (2024-03-04T10:48:13Z) - Enhancing Multilingual Capabilities of Large Language Models through
Self-Distillation from Resource-Rich Languages [60.162717568496355]
Large language models (LLMs) have been pre-trained on multilingual corpora.
Their performance still lags behind in most languages compared to a few resource-rich languages.
arXiv Detail & Related papers (2024-02-19T15:07:32Z) - High-resource Language-specific Training for Multilingual Neural Machine
Translation [109.31892935605192]
We propose the multilingual translation model with the high-resource language-specific training (HLT-MT) to alleviate the negative interference.
Specifically, we first train the multilingual model only with the high-resource pairs and select the language-specific modules at the top of the decoder.
HLT-MT is further trained on all available corpora to transfer knowledge from high-resource languages to low-resource languages.
arXiv Detail & Related papers (2022-07-11T14:33:13Z) - Multilingual Transfer Learning for QA Using Translation as Data
Augmentation [13.434957024596898]
We explore strategies that improve cross-lingual transfer by bringing the multilingual embeddings closer in the semantic space.
We propose two novel strategies, language adversarial training and language arbitration framework, which significantly improve the (zero-resource) cross-lingual transfer performance.
Empirically, we show that the proposed models outperform the previous zero-shot baseline on the recently introduced multilingual MLQA and TyDiQA datasets.
arXiv Detail & Related papers (2020-12-10T20:29:34Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z) - Balancing Training for Multilingual Neural Machine Translation [130.54253367251738]
multilingual machine translation (MT) models can translate to/from multiple languages.
Standard practice is to up-sample less resourced languages to increase representation.
We propose a method that instead automatically learns how to weight training data through a data scorer.
arXiv Detail & Related papers (2020-04-14T18:23:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.