Dynamic Curriculum Learning for Low-Resource Neural Machine Translation
- URL: http://arxiv.org/abs/2011.14608v1
- Date: Mon, 30 Nov 2020 08:13:41 GMT
- Title: Dynamic Curriculum Learning for Low-Resource Neural Machine Translation
- Authors: Chen Xu, Bojie Hu, Yufan Jiang, Kai Feng, Zeyang Wang, Shen Huang, Qi
Ju, Tong Xiao, Jingbo Zhu
- Abstract summary: We investigate the effective use of training data for low-resource NMT.
In particular, we propose a dynamic curriculum learning (DCL) method to reorder training samples in training.
This eases training by highlighting easy samples that the current model has enough competence to learn.
- Score: 27.993407441922507
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large amounts of data has made neural machine translation (NMT) a big success
in recent years. But it is still a challenge if we train these models on
small-scale corpora. In this case, the way of using data appears to be more
important. Here, we investigate the effective use of training data for
low-resource NMT. In particular, we propose a dynamic curriculum learning (DCL)
method to reorder training samples in training. Unlike previous work, we do not
use a static scoring function for reordering. Instead, the order of training
samples is dynamically determined in two ways - loss decline and model
competence. This eases training by highlighting easy samples that the current
model has enough competence to learn. We test our DCL method in a
Transformer-based system. Experimental results show that DCL outperforms
several strong baselines on three low-resource machine translation benchmarks
and different sized data of WMT' 16 En-De.
Related papers
- Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Towards Foundation Models for Scientific Machine Learning:
Characterizing Scaling and Transfer Behavior [32.74388989649232]
We study how pre-training could be used for scientific machine learning (SciML) applications.
We find that fine-tuning these models yields more performance gains as model size increases.
arXiv Detail & Related papers (2023-06-01T00:32:59Z) - Conditional Online Learning for Keyword Spotting [0.0]
This work investigates a simple but effective online continual learning method that updates a keyword spotter on-device via SGD as new data becomes available.
Experiments demonstrate that, compared to a naive online learning implementation, conditional model updates based on its performance in a small hold-out set drawn from the training distribution mitigate catastrophic forgetting.
arXiv Detail & Related papers (2023-05-19T15:46:31Z) - EfficientTrain: Exploring Generalized Curriculum Learning for Training
Visual Backbones [80.662250618795]
This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers)
As an off-the-shelf method, it reduces the wall-time training cost of a wide variety of popular models by >1.5x on ImageNet-1K/22K without sacrificing accuracy.
arXiv Detail & Related papers (2022-11-17T17:38:55Z) - Effective Vision Transformer Training: A Data-Centric Perspective [24.02488085447691]
Vision Transformers (ViTs) have shown promising performance compared with Convolutional Neural Networks (CNNs)
In this paper, we define several metrics, including Dynamic Data Proportion (DDP) and Knowledge Assimilation Rate (KAR)
We propose a novel data-centric ViT training framework to dynamically measure the difficulty'' of training samples and generate effective'' samples for models at different training stages.
arXiv Detail & Related papers (2022-09-29T17:59:46Z) - Improving Neural Machine Translation by Denoising Training [95.96569884410137]
We present a simple and effective pretraining strategy Denoising Training DoT for neural machine translation.
We update the model parameters with source- and target-side denoising tasks at the early stage and then tune the model normally.
Experiments show DoT consistently improves the neural machine translation performance across 12 bilingual and 16 multilingual directions.
arXiv Detail & Related papers (2022-01-19T00:11:38Z) - Improving Neural Machine Translation by Bidirectional Training [85.64797317290349]
We present a simple and effective pretraining strategy -- bidirectional training (BiT) for neural machine translation.
Specifically, we bidirectionally update the model parameters at the early stage and then tune the model normally.
Experimental results show that BiT pushes the SOTA neural machine translation performance across 15 translation tasks on 8 language pairs significantly higher.
arXiv Detail & Related papers (2021-09-16T07:58:33Z) - Data Rejuvenation: Exploiting Inactive Training Examples for Neural
Machine Translation [86.40610684026262]
In this work, we explore to identify the inactive training examples which contribute less to the model performance.
We introduce data rejuvenation to improve the training of NMT models on large-scale datasets by exploiting inactive examples.
Experimental results on WMT14 English-German and English-French datasets show that the proposed data rejuvenation consistently and significantly improves performance for several strong NMT models.
arXiv Detail & Related papers (2020-10-06T08:57:31Z) - Reinforced Curriculum Learning on Pre-trained Neural Machine Translation
Models [20.976165305749777]
We learn a curriculum for improving a pre-trained NMT model by re-selecting influential data samples from the original training set.
We propose a data selection framework based on Deterministic Actor-Critic, in which a critic network predicts the expected change of model performance.
arXiv Detail & Related papers (2020-04-13T03:40:44Z) - Understanding Learning Dynamics for Neural Machine Translation [53.23463279153577]
We propose to understand learning dynamics of NMT by using Loss Change Allocation (LCA)citeplan 2019-loss-change-allocation.
As LCA requires calculating the gradient on an entire dataset for each update, we instead present an approximate to put it into practice in NMT scenario.
Our simulated experiment shows that such approximate calculation is efficient and is empirically proved to deliver consistent results.
arXiv Detail & Related papers (2020-04-05T13:32:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.