Related papers: Improving Neural Machine Translation by Denoising Training

Improving Neural Machine Translation by Denoising Training

URL: http://arxiv.org/abs/2201.07365v2
Date: Thu, 20 Jan 2022 03:55:52 GMT
Title: Improving Neural Machine Translation by Denoising Training
Authors: Liang Ding, Keqin Peng and Dacheng Tao
Abstract summary: We present a simple and effective pretraining strategy Denoising Training DoT for neural machine translation. We update the model parameters with source- and target-side denoising tasks at the early stage and then tune the model normally. Experiments show DoT consistently improves the neural machine translation performance across 12 bilingual and 16 multilingual directions.
Score: 95.96569884410137
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: We present a simple and effective pretraining strategy {D}en{o}ising {T}raining DoT for neural machine translation. Specifically, we update the model parameters with source- and target-side denoising tasks at the early stage and then tune the model normally. Notably, our approach does not increase any parameters or training steps, requiring the parallel data merely. Experiments show that DoT consistently improves the neural machine translation performance across 12 bilingual and 16 multilingual directions (data size ranges from 80K to 20M). In addition, we show that DoT can complement existing data manipulation strategies, i.e. curriculum learning, knowledge distillation, data diversification, bidirectional training, and back-translation. Encouragingly, we found that DoT outperforms costly pretrained model mBART in high-resource settings. Analyses show DoT is a novel in-domain cross-lingual pretraining strategy and could offer further improvements with task-relevant self-supervisions.

Related papers

Instruction Tuned Models are Quick Learners [20.771930945083994]
In this work, we demonstrate the sample efficiency of instruction tuned models over various tasks. In the STL setting, instruction tuned models equipped with 25% of the downstream train data surpass the SOTA performance on the downstream tasks. In the MTL setting, an instruction tuned model trained on only 6% of downstream training data achieve SOTA, while using 100% of the training data results in a 3.69% points improvement.
arXiv Detail & Related papers (2023-05-17T22:30:01Z)
On the Pareto Front of Multilingual Neural Machine Translation [123.94355117635293]
We study how the performance of a given direction changes with its sampling ratio in Neural Machine Translation (MNMT) We propose the Double Power Law to predict the unique performance trade-off front in MNMT. In our experiments, it achieves better performance than temperature searching and gradient manipulation methods with only 1/5 to 1/2 of the total training budget.
arXiv Detail & Related papers (2023-04-06T16:49:19Z)
Efficient Speech Translation with Pre-trained Models [13.107314023500349]
We investigate efficient strategies to build cascaded and end-to-end speech translation systems based on pre-trained models. While the end-to-end models show superior translation performance to cascaded ones, the application of this technology has a limitation on the need for additional end-to-end training data.
arXiv Detail & Related papers (2022-11-09T15:07:06Z)
Improving Neural Machine Translation by Bidirectional Training [85.64797317290349]
We present a simple and effective pretraining strategy -- bidirectional training (BiT) for neural machine translation. Specifically, we bidirectionally update the model parameters at the early stage and then tune the model normally. Experimental results show that BiT pushes the SOTA neural machine translation performance across 15 translation tasks on 8 language pairs significantly higher.
arXiv Detail & Related papers (2021-09-16T07:58:33Z)
The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task. We trained our models with the officially provided ASR and MT datasets. To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z)
Multilingual Speech Translation with Efficient Finetuning of Pretrained Models [82.22294901727933]
A minimalistic LNA (LayerNorm and Attention) finetuning can achieve zero-shot crosslingual and cross-modality transfer ability. Our approach demonstrates strong zero-shot performance in a many-to-many multilingual model.
arXiv Detail & Related papers (2020-10-24T08:15:08Z)
Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models via Continual Learning [74.25168207651376]
Fine-tuning pre-trained language models to downstream cross-lingual tasks has shown promising results. We leverage continual learning to preserve the cross-lingual ability of the pre-trained model when we fine-tune it to downstream tasks. Our methods achieve better performance than other fine-tuning baselines on the zero-shot cross-lingual part-of-speech tagging and named entity recognition tasks.
arXiv Detail & Related papers (2020-04-29T14:07:18Z)
Reinforced Curriculum Learning on Pre-trained Neural Machine Translation Models [20.976165305749777]
We learn a curriculum for improving a pre-trained NMT model by re-selecting influential data samples from the original training set. We propose a data selection framework based on Deterministic Actor-Critic, in which a critic network predicts the expected change of model performance.
arXiv Detail & Related papers (2020-04-13T03:40:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.