Pronoun-Targeted Fine-tuning for NMT with Hybrid Losses
- URL: http://arxiv.org/abs/2010.07638v1
- Date: Thu, 15 Oct 2020 10:11:40 GMT
- Title: Pronoun-Targeted Fine-tuning for NMT with Hybrid Losses
- Authors: Prathyusha Jwalapuram, Shafiq Joty, Youlin Shen
- Abstract summary: We introduce a class of conditional generative-discriminative hybrid losses that we use to fine-tune a trained machine translation model.
We improve the model performance of both a sentence-level and a contextual model without using any additional data.
Our sentence-level model shows a 0.5 BLEU improvement on both the WMT14 and the IWSLT13 De-En testsets.
Our contextual model achieves the best results, improving from 31.81 to 32 BLEU on WMT14 De-En testset, and from 32.10 to 33.13 on the IWSLT13 De-En
- Score: 6.596002578395152
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Popular Neural Machine Translation model training uses strategies like
backtranslation to improve BLEU scores, requiring large amounts of additional
data and training. We introduce a class of conditional
generative-discriminative hybrid losses that we use to fine-tune a trained
machine translation model. Through a combination of targeted fine-tuning
objectives and intuitive re-use of the training data the model has failed to
adequately learn from, we improve the model performance of both a
sentence-level and a contextual model without using any additional data. We
target the improvement of pronoun translations through our fine-tuning and
evaluate our models on a pronoun benchmark testset. Our sentence-level model
shows a 0.5 BLEU improvement on both the WMT14 and the IWSLT13 De-En testsets,
while our contextual model achieves the best results, improving from 31.81 to
32 BLEU on WMT14 De-En testset, and from 32.10 to 33.13 on the IWSLT13 De-En
testset, with corresponding improvements in pronoun translation. We further
show the generalizability of our method by reproducing the improvements on two
additional language pairs, Fr-En and Cs-En. Code available at
<https://github.com/ntunlp/pronoun-finetuning>.
Related papers
- Machine Translation for Ge'ez Language [0.0]
Machine translation for low-resource languages such as Ge'ez faces challenges such as out-of-vocabulary words, domain mismatches, and lack of labeled training data.
We develop a multilingual neural machine translation (MNMT) model based on languages relatedness.
We also experiment with using GPT-3.5, a state-of-the-art LLM, for few-shot translation with fuzzy matches.
arXiv Detail & Related papers (2023-11-24T14:55:23Z) - A Paradigm Shift in Machine Translation: Boosting Translation
Performance of Large Language Models [27.777372498182864]
We propose a novel fine-tuning approach for Generative Large Language Models (LLMs)
Our approach consists of two fine-tuning stages: initial fine-tuning on monolingual data followed by subsequent fine-tuning on a small set of high-quality parallel data.
Based on LLaMA-2 as our underlying model, our results show that the model can achieve an average improvement of more than 12 BLEU and 12 COMET over its zero-shot performance.
arXiv Detail & Related papers (2023-09-20T22:53:15Z) - Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC [51.34222224728979]
This paper introduces a series of innovative techniques to enhance the translation quality of Non-Autoregressive Translation (NAT) models.
We propose fine-tuning Pretrained Multilingual Language Models (PMLMs) with the CTC loss to train NAT models effectively.
Our model exhibits a remarkable speed improvement of 16.35 times compared to the autoregressive model.
arXiv Detail & Related papers (2023-06-10T05:24:29Z) - Improving Neural Machine Translation by Denoising Training [95.96569884410137]
We present a simple and effective pretraining strategy Denoising Training DoT for neural machine translation.
We update the model parameters with source- and target-side denoising tasks at the early stage and then tune the model normally.
Experiments show DoT consistently improves the neural machine translation performance across 12 bilingual and 16 multilingual directions.
arXiv Detail & Related papers (2022-01-19T00:11:38Z) - Reconsidering the Past: Optimizing Hidden States in Language Models [35.7524942657169]
We present Hidden-State Optimization (HSO), a gradient-based method for improving the performance of transformer language models.
HSO computes the gradient of the log-probability the language model assigns to an evaluation text, but uses it to update the cached hidden states rather than the model parameters.
arXiv Detail & Related papers (2021-12-16T06:14:37Z) - Improving Neural Machine Translation by Bidirectional Training [85.64797317290349]
We present a simple and effective pretraining strategy -- bidirectional training (BiT) for neural machine translation.
Specifically, we bidirectionally update the model parameters at the early stage and then tune the model normally.
Experimental results show that BiT pushes the SOTA neural machine translation performance across 15 translation tasks on 8 language pairs significantly higher.
arXiv Detail & Related papers (2021-09-16T07:58:33Z) - BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural
Machine Translation [38.017030073108735]
We show that a tailored and suitable bilingual pre-trained language model (dubbed BiBERT) achieves state-of-the-art translation performance.
Our best models achieve BLEU scores of 30.45 for En->De and 38.61 for De->En on the IWSLT'14 dataset, and 31.26 for En->De and 34.94 for De->En on the WMT'14 dataset.
arXiv Detail & Related papers (2021-09-09T23:43:41Z) - The USYD-JD Speech Translation System for IWSLT 2021 [85.64797317290349]
This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task.
We trained our models with the officially provided ASR and MT datasets.
To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning.
arXiv Detail & Related papers (2021-07-24T09:53:34Z) - Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in
Non-Autoregressive Translation [98.11249019844281]
Knowledge distillation (KD) is commonly used to construct synthetic data for training non-autoregressive translation (NAT) models.
We propose reverse KD to rejuvenate more alignments for low-frequency target words.
Results demonstrate that the proposed approach can significantly and universally improve translation quality.
arXiv Detail & Related papers (2021-06-02T02:41:40Z) - Understanding and Improving Lexical Choice in Non-Autoregressive
Translation [98.11249019844281]
We propose to expose the raw data to NAT models to restore the useful information of low-frequency words.
Our approach pushes the SOTA NAT performance on the WMT14 English-German and WMT16 Romanian-English datasets up to 27.8 and 33.8 BLEU points, respectively.
arXiv Detail & Related papers (2020-12-29T03:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.