Integrating Pre-trained Language Model into Neural Machine Translation
- URL: http://arxiv.org/abs/2310.19680v4
- Date: Sat, 13 Jan 2024 15:39:10 GMT
- Title: Integrating Pre-trained Language Model into Neural Machine Translation
- Authors: Soon-Jae Hwang, Chang-Sung Jeong
- Abstract summary: The deficiency of high-quality bilingual language pair data poses a major challenge to improving NMT performance.
Recent studies have been exploring the use of contextual information from pre-trained language model (PLM) to address this problem.
This study proposes PLM-integrated NMT model to overcome the identified problems.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural Machine Translation (NMT) has become a significant technology in
natural language processing through extensive research and development.
However, the deficiency of high-quality bilingual language pair data still
poses a major challenge to improving NMT performance. Recent studies have been
exploring the use of contextual information from pre-trained language model
(PLM) to address this problem. Yet, the issue of incompatibility between PLM
and NMT model remains unresolved. This study proposes PLM-integrated NMT
(PiNMT) model to overcome the identified problems. PiNMT model consists of
three critical components, PLM Multi Layer Converter, Embedding Fusion, and
Cosine Alignment, each playing a vital role in providing effective PLM
information to NMT. Furthermore, two training strategies, Separate Learning
Rates and Dual Step Training, are also introduced in this paper. By
implementing the proposed PiNMT model and training strategy, we achieve
state-of-the-art performance on the IWSLT'14 En$\leftrightarrow$De dataset.
This study's outcomes are noteworthy as they demonstrate a novel approach for
efficiently integrating PLM with NMT to overcome incompatibility and enhance
performance.
Related papers
- TasTe: Teaching Large Language Models to Translate through Self-Reflection [82.83958470745381]
Large language models (LLMs) have exhibited remarkable performance in various natural language processing tasks.
We propose the TasTe framework, which stands for translating through self-reflection.
The evaluation results in four language directions on the WMT22 benchmark reveal the effectiveness of our approach compared to existing methods.
arXiv Detail & Related papers (2024-06-12T17:21:21Z) - Improving Machine Translation with Large Language Models: A Preliminary Study with Cooperative Decoding [73.32763904267186]
Large Language Models (LLMs) present the potential for achieving superior translation quality.
We propose Cooperative Decoding (CoDec) which treats NMT systems as a pretranslation model and MT-oriented LLMs as a supplemental solution.
arXiv Detail & Related papers (2023-11-06T03:41:57Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z) - Unified Model Learning for Various Neural Machine Translation [63.320005222549646]
Existing machine translation (NMT) studies mainly focus on developing dataset-specific models.
We propose a versatile'' model, i.e., the Unified Model Learning for NMT (UMLNMT) that works with data from different tasks.
OurNMT results in substantial improvements over dataset-specific models with significantly reduced model deployment costs.
arXiv Detail & Related papers (2023-05-04T12:21:52Z) - Towards Reliable Neural Machine Translation with Consistency-Aware
Meta-Learning [24.64700139151659]
Current Neural machine translation (NMT) systems suffer from a lack of reliability.
We present a consistency-aware meta-learning (CAML) framework derived from the model-agnostic meta-learning (MAML) algorithm to address it.
We conduct experiments on the NIST Chinese to English task, three WMT translation tasks, and the TED M2O task.
arXiv Detail & Related papers (2023-03-20T09:41:28Z) - Exploiting Language Relatedness in Machine Translation Through Domain
Adaptation Techniques [3.257358540764261]
We present a novel approach of using a scaled similarity score of sentences, especially for related languages based on a 5-gram KenLM language model.
Our approach succeeds in increasing 2 BLEU point on multi-domain approach, 3 BLEU point on fine-tuning for NMT and 2 BLEU point on iterative back-translation approach.
arXiv Detail & Related papers (2023-03-03T09:07:30Z) - Active Learning for Neural Machine Translation [0.0]
We incorporated a technique known Active Learning with the NMT toolkit Joey NMT to reach sufficient accuracy and robust predictions of low-resource language translation.
This work uses transformer-based NMT systems; baseline model (BM), fully trained model (FTM), active learning least confidence based model (ALLCM) and active learning margin sampling based model (ALMSM) when translating English to Hindi.
arXiv Detail & Related papers (2022-12-30T17:04:01Z) - Alternated Training with Synthetic and Authentic Data for Neural Machine
Translation [49.35605028467887]
We propose alternated training with synthetic and authentic data for neural machine translation (NMT)
Compared with previous work, we introduce authentic data as guidance to prevent the training of NMT models from being disturbed by noisy synthetic data.
Experiments on Chinese-English and German-English translation tasks show that our approach improves the performance over several strong baselines.
arXiv Detail & Related papers (2021-06-16T07:13:16Z) - Self-supervised and Supervised Joint Training for Resource-rich Machine
Translation [30.502625878505732]
Self-supervised pre-training of text representations has been successfully applied to low-resource Neural Machine Translation (NMT)
We propose a joint training approach, $F$-XEnDec, to combine self-supervised and supervised learning to optimize NMT models.
arXiv Detail & Related papers (2021-06-08T02:35:40Z) - Multi-task Learning for Multilingual Neural Machine Translation [32.81785430242313]
We propose a multi-task learning framework that jointly trains the model with the translation task on bitext data and two denoising tasks on the monolingual data.
We show that the proposed approach can effectively improve the translation quality for both high-resource and low-resource languages.
arXiv Detail & Related papers (2020-10-06T06:54:12Z) - Learning to Multi-Task Learn for Better Neural Machine Translation [53.06405021125476]
Multi-task learning is an elegant approach to inject linguistic-related biases into neural machine translation models.
We propose a novel framework for learning the training schedule, ie learning to multi-task learn, for the biased-MTL setting of interest.
Experiments show the resulting automatically learned training schedulers are competitive with the best, and lead to up to +1.1 BLEU score improvements.
arXiv Detail & Related papers (2020-01-10T03:12:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.