Data Rejuvenation: Exploiting Inactive Training Examples for Neural
Machine Translation
- URL: http://arxiv.org/abs/2010.02552v1
- Date: Tue, 6 Oct 2020 08:57:31 GMT
- Title: Data Rejuvenation: Exploiting Inactive Training Examples for Neural
Machine Translation
- Authors: Wenxiang Jiao, Xing Wang, Shilin He, Irwin King, Michael R. Lyu,
Zhaopeng Tu
- Abstract summary: In this work, we explore to identify the inactive training examples which contribute less to the model performance.
We introduce data rejuvenation to improve the training of NMT models on large-scale datasets by exploiting inactive examples.
Experimental results on WMT14 English-German and English-French datasets show that the proposed data rejuvenation consistently and significantly improves performance for several strong NMT models.
- Score: 86.40610684026262
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale training datasets lie at the core of the recent success of neural
machine translation (NMT) models. However, the complex patterns and potential
noises in the large-scale data make training NMT models difficult. In this
work, we explore to identify the inactive training examples which contribute
less to the model performance, and show that the existence of inactive examples
depends on the data distribution. We further introduce data rejuvenation to
improve the training of NMT models on large-scale datasets by exploiting
inactive examples. The proposed framework consists of three phases. First, we
train an identification model on the original training data, and use it to
distinguish inactive examples and active examples by their sentence-level
output probabilities. Then, we train a rejuvenation model on the active
examples, which is used to re-label the inactive examples with
forward-translation. Finally, the rejuvenated examples and the active examples
are combined to train the final NMT model. Experimental results on WMT14
English-German and English-French datasets show that the proposed data
rejuvenation consistently and significantly improves performance for several
strong NMT models. Extensive analyses reveal that our approach stabilizes and
accelerates the training process of NMT models, resulting in final models with
better generalization capability.
Related papers
- Training Data Attribution for Diffusion Models [1.1733780065300188]
We propose a novel solution that reveals how training data influence the output of diffusion models through the use of ensembles.
In our approach individual models in an encoded ensemble are trained on carefully engineered splits of the overall training data to permit the identification of influential training examples.
The resulting model ensembles enable efficient ablation of training data influence, allowing us to assess the impact of training data on model outputs.
arXiv Detail & Related papers (2023-06-03T18:36:12Z) - Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models.
We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks.
OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z) - TRAK: Attributing Model Behavior at Scale [79.56020040993947]
We present TRAK (Tracing with Randomly-trained After Kernel), a data attribution method that is both effective and computationally tractable for large-scale, differenti models.
arXiv Detail & Related papers (2023-03-24T17:56:22Z) - Beyond Transfer Learning: Co-finetuning for Action Localisation [64.07196901012153]
We propose co-finetuning -- simultaneously training a single model on multiple upstream'' and downstream'' tasks.
We demonstrate that co-finetuning outperforms traditional transfer learning when using the same total amount of data.
We also show how we can easily extend our approach to multiple upstream'' datasets to further improve performance.
arXiv Detail & Related papers (2022-07-08T10:25:47Z) - METRO: Efficient Denoising Pretraining of Large Scale Autoencoding
Language Models with Model Generated Signals [151.3601429216877]
We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model.
We propose a recipe, namely "Model generated dEnoising TRaining Objective" (METRO)
The resultant models, METRO-LM, consisting of up to 5.4 billion parameters, achieve new state-of-the-art on the GLUE, SuperGLUE, and SQuAD benchmarks.
arXiv Detail & Related papers (2022-04-13T21:39:15Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - Dynamic Curriculum Learning for Low-Resource Neural Machine Translation [27.993407441922507]
We investigate the effective use of training data for low-resource NMT.
In particular, we propose a dynamic curriculum learning (DCL) method to reorder training samples in training.
This eases training by highlighting easy samples that the current model has enough competence to learn.
arXiv Detail & Related papers (2020-11-30T08:13:41Z) - Reinforced Curriculum Learning on Pre-trained Neural Machine Translation
Models [20.976165305749777]
We learn a curriculum for improving a pre-trained NMT model by re-selecting influential data samples from the original training set.
We propose a data selection framework based on Deterministic Actor-Critic, in which a critic network predicts the expected change of model performance.
arXiv Detail & Related papers (2020-04-13T03:40:44Z) - Forecasting Industrial Aging Processes with Machine Learning Methods [0.0]
We evaluate a wider range of data-driven models, comparing some traditional stateless models to more complex recurrent neural networks.
Our results show that recurrent models produce near perfect predictions when trained on larger datasets.
arXiv Detail & Related papers (2020-02-05T13:06:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.