Reinforced Curriculum Learning on Pre-trained Neural Machine Translation
Models
- URL: http://arxiv.org/abs/2004.05757v1
- Date: Mon, 13 Apr 2020 03:40:44 GMT
- Title: Reinforced Curriculum Learning on Pre-trained Neural Machine Translation
Models
- Authors: Mingjun Zhao, Haijiang Wu, Di Niu and Xiaoli Wang
- Abstract summary: We learn a curriculum for improving a pre-trained NMT model by re-selecting influential data samples from the original training set.
We propose a data selection framework based on Deterministic Actor-Critic, in which a critic network predicts the expected change of model performance.
- Score: 20.976165305749777
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The competitive performance of neural machine translation (NMT) critically
relies on large amounts of training data. However, acquiring high-quality
translation pairs requires expert knowledge and is costly. Therefore, how to
best utilize a given dataset of samples with diverse quality and
characteristics becomes an important yet understudied question in NMT.
Curriculum learning methods have been introduced to NMT to optimize a model's
performance by prescribing the data input order, based on heuristics such as
the assessment of noise and difficulty levels. However, existing methods
require training from scratch, while in practice most NMT models are
pre-trained on big data already. Moreover, as heuristics, they do not
generalize well. In this paper, we aim to learn a curriculum for improving a
pre-trained NMT model by re-selecting influential data samples from the
original training set and formulate this task as a reinforcement learning
problem. Specifically, we propose a data selection framework based on
Deterministic Actor-Critic, in which a critic network predicts the expected
change of model performance due to a certain sample, while an actor network
learns to select the best sample out of a random batch of samples presented to
it. Experiments on several translation datasets show that our method can
further improve the performance of NMT when original batch training reaches its
ceiling, without using additional new training data, and significantly
outperforms several strong baseline methods.
Related papers
- Unsupervised Pre-training with Language-Vision Prompts for Low-Data Instance Segmentation [105.23631749213729]
We propose a novel method for unsupervised pre-training in low-data regimes.
Inspired by the recently successful prompting technique, we introduce a new method, Unsupervised Pre-training with Language-Vision Prompts.
We show that our method can converge faster and perform better than CNN-based models in low-data regimes.
arXiv Detail & Related papers (2024-05-22T06:48:43Z) - Order Matters in the Presence of Dataset Imbalance for Multilingual
Learning [53.74649778447903]
We present a simple yet effective method of pre-training on high-resource tasks, followed by fine-tuning on a mixture of high/low-resource tasks.
We show its improvements in neural machine translation (NMT) and multi-lingual language modeling.
arXiv Detail & Related papers (2023-12-11T05:46:57Z) - Noisy Self-Training with Synthetic Queries for Dense Retrieval [49.49928764695172]
We introduce a novel noisy self-training framework combined with synthetic queries.
Experimental results show that our method improves consistently over existing methods.
Our method is data efficient and outperforms competitive baselines.
arXiv Detail & Related papers (2023-11-27T06:19:50Z) - Self-Influence Guided Data Reweighting for Language Model Pre-training [46.57714637505164]
Language Models (LMs) pre-trained with self-supervision on large text corpora have become the default starting point for developing models for various NLP tasks.
All data samples in the corpus are treated with equal importance during LM pre-training.
Due to varying levels of relevance and quality of data, equal importance to all the data samples may not be the optimal choice.
We propose PRESENCE, a method for jointly reweighting samples by leveraging self-influence (SI) scores as an indicator of sample importance and pre-training.
arXiv Detail & Related papers (2023-11-02T01:00:46Z) - Data Selection Curriculum for Neural Machine Translation [31.55953464971441]
We introduce a two-stage curriculum training framework for NMT models.
We fine-tune a base NMT model on subsets of data, selected by both deterministic scoring using pre-trained methods and online scoring.
We have shown that our curriculum strategies consistently demonstrate better quality (up to +2.2 BLEU improvement) and faster convergence.
arXiv Detail & Related papers (2022-03-25T19:08:30Z) - Improving Neural Machine Translation by Denoising Training [95.96569884410137]
We present a simple and effective pretraining strategy Denoising Training DoT for neural machine translation.
We update the model parameters with source- and target-side denoising tasks at the early stage and then tune the model normally.
Experiments show DoT consistently improves the neural machine translation performance across 12 bilingual and 16 multilingual directions.
arXiv Detail & Related papers (2022-01-19T00:11:38Z) - Reconstructing Training Data from Diverse ML Models by Ensemble
Inversion [8.414622657659168]
Model Inversion (MI), in which an adversary abuses access to a trained Machine Learning (ML) model, has attracted increasing research attention.
We propose an ensemble inversion technique that estimates the distribution of original training data by training a generator constrained by an ensemble of trained models.
We achieve high quality results without any dataset and show how utilizing an auxiliary dataset that's similar to the presumed training data improves the results.
arXiv Detail & Related papers (2021-11-05T18:59:01Z) - Dynamic Curriculum Learning for Low-Resource Neural Machine Translation [27.993407441922507]
We investigate the effective use of training data for low-resource NMT.
In particular, we propose a dynamic curriculum learning (DCL) method to reorder training samples in training.
This eases training by highlighting easy samples that the current model has enough competence to learn.
arXiv Detail & Related papers (2020-11-30T08:13:41Z) - Data Rejuvenation: Exploiting Inactive Training Examples for Neural
Machine Translation [86.40610684026262]
In this work, we explore to identify the inactive training examples which contribute less to the model performance.
We introduce data rejuvenation to improve the training of NMT models on large-scale datasets by exploiting inactive examples.
Experimental results on WMT14 English-German and English-French datasets show that the proposed data rejuvenation consistently and significantly improves performance for several strong NMT models.
arXiv Detail & Related papers (2020-10-06T08:57:31Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.