ACT-MNMT Auto-Constriction Turning for Multilingual Neural Machine
Translation
- URL: http://arxiv.org/abs/2403.06745v1
- Date: Mon, 11 Mar 2024 14:10:57 GMT
- Title: ACT-MNMT Auto-Constriction Turning for Multilingual Neural Machine
Translation
- Authors: Shaojie Dai, Xin Liu, Ping Luo and Yue Yu
- Abstract summary: This issue introduces an textbfunderlineAuto-textbfunderlineConstriction textbfunderlineTurning mechanism for textbfunderlineMultilingual textbfunderlineNeural textbfunderlineMachine textbfunderlineTranslation (model)
- Score: 38.30649186517611
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language model (LLM) has achieved promising performance in multilingual
machine translation tasks through zero/few-shot prompts or prompt-tuning.
However, due to the mixture of multilingual data during the pre-training of
LLM, the LLM-based translation models face the off-target issue in both
prompt-based methods, including a series of phenomena, namely instruction
misunderstanding, translation with wrong language and over-generation. For this
issue, this paper introduces an
\textbf{\underline{A}}uto-\textbf{\underline{C}}onstriction
\textbf{\underline{T}}urning mechanism for \textbf{\underline{M}}ultilingual
\textbf{\underline{N}}eural \textbf{\underline{M}}achine
\textbf{\underline{T}}ranslation (\model), which is a novel supervised
fine-tuning mechanism and orthogonal to the traditional prompt-based methods.
In this method, \model automatically constructs a constrained template in the
target side by adding trigger tokens ahead of the ground truth. Furthermore,
trigger tokens can be arranged and combined freely to represent different task
semantics, and they can be iteratively updated to maximize the label
likelihood. Experiments are performed on WMT test sets with multiple metrics,
and the experimental results demonstrate that \model achieves substantially
improved performance across multiple translation directions and reduce the
off-target phenomena in the translation.
Related papers
- LANDeRMT: Detecting and Routing Language-Aware Neurons for Selectively Finetuning LLMs to Machine Translation [43.26446958873554]
Large language models (LLMs) have shown promising results in multilingual translation even with limited bilingual supervision.
Recent advancements in large language models (LLMs) have shown promising results in multilingual translation even with limited bilingual supervision.
LandeRMT is a framework that selectively finetunes LLMs to textbfMachine textbfTranslation with diverse translation training data.
arXiv Detail & Related papers (2024-09-29T02:39:42Z) - Towards Zero-Shot Multimodal Machine Translation [64.9141931372384]
We propose a method to bypass the need for fully supervised data to train multimodal machine translation systems.
Our method, called ZeroMMT, consists in adapting a strong text-only machine translation (MT) model by training it on a mixture of two objectives.
To prove that our method generalizes to languages with no fully supervised training data available, we extend the CoMMuTE evaluation dataset to three new languages: Arabic, Russian and Chinese.
arXiv Detail & Related papers (2024-07-18T15:20:31Z) - G-SPEED: General SParse Efficient Editing MoDel [25.48360227520061]
underlinetextbfGeneral underlinetextbfSParse underlinetextbfEfficient underlinetextbfEditing MounderlinetextbfDel(textbfG-SPEED)
arXiv Detail & Related papers (2023-10-16T15:01:18Z) - Revamping Multilingual Agreement Bidirectionally via Switched
Back-translation for Multilingual Neural Machine Translation [107.83158521848372]
multilingual agreement (MA) has shown its importance for multilingual neural machine translation (MNMT)
We present textbfBidirectional textbfMultilingual textbfAgreement via textbfSwitched textbfBack-textbftranslation (textbfBMA-SBT)
It is a novel and universal multilingual agreement framework for fine-tuning pre-trained MNMT models.
arXiv Detail & Related papers (2022-09-28T09:14:58Z) - Anticipation-free Training for Simultaneous Translation [70.85761141178597]
Simultaneous translation (SimulMT) speeds up the translation process by starting to translate before the source sentence is completely available.
Existing methods increase latency or introduce adaptive read-write policies for SimulMT models to handle local reordering and improve translation quality.
We propose a new framework that decomposes the translation process into the monotonic translation step and the reordering step.
arXiv Detail & Related papers (2022-01-30T16:29:37Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.