Multilingual Non-Autoregressive Machine Translation without Knowledge Distillation
- URL: http://arxiv.org/abs/2502.04537v1
- Date: Thu, 06 Feb 2025 22:16:28 GMT
- Title: Multilingual Non-Autoregressive Machine Translation without Knowledge Distillation
- Authors: Chenyang Huang, Fei Huang, Zaixiang Zheng, Osmar R. Zaïane, Hao Zhou, Lili Mou,
- Abstract summary: We propose an approach to non-autoregressive multilingual machine translation.
Our system leverages the recent advance of the directed acyclic Transformer.
We also propose a pivot back-translation approach to improve the generalization to unseen translation directions.
- Score: 55.525158411296474
- License:
- Abstract: Multilingual neural machine translation (MNMT) aims at using one single model for multiple translation directions. Recent work applies non-autoregressive Transformers to improve the efficiency of MNMT, but requires expensive knowledge distillation (KD) processes. To this end, we propose an M-DAT approach to non-autoregressive multilingual machine translation. Our system leverages the recent advance of the directed acyclic Transformer (DAT), which does not require KD. We further propose a pivot back-translation (PivotBT) approach to improve the generalization to unseen translation directions. Experiments show that our M-DAT achieves state-of-the-art performance in non-autoregressive MNMT.
Related papers
- PMMT: Preference Alignment in Multilingual Machine Translation via LLM Distillation [4.667901787486126]
A new method is proposed to generate large-scale multilingual parallel corpora with specific translation preferences.
Experiments indicate that the proposed method takes the lead in translation tasks with aligned human preferences by a large margin.
arXiv Detail & Related papers (2024-10-15T08:54:27Z) - Towards Zero-Shot Multimodal Machine Translation [64.9141931372384]
We propose a method to bypass the need for fully supervised data to train multimodal machine translation systems.
Our method, called ZeroMMT, consists in adapting a strong text-only machine translation (MT) model by training it on a mixture of two objectives.
To prove that our method generalizes to languages with no fully supervised training data available, we extend the CoMMuTE evaluation dataset to three new languages: Arabic, Russian and Chinese.
arXiv Detail & Related papers (2024-07-18T15:20:31Z) - Rethinking Human-like Translation Strategy: Integrating Drift-Diffusion
Model with Large Language Models for Machine Translation [15.333148705267012]
We propose Thinker with the Drift-Diffusion Model to emulate human translators' dynamic decision-making under constrained resources.
We conduct experiments under the high-resource, low-resource, and commonsense translation settings using the WMT22 and CommonMT datasets.
We also perform additional analysis and evaluation on commonsense translation to illustrate the high effectiveness and efficacy of the proposed method.
arXiv Detail & Related papers (2024-02-16T14:00:56Z) - Decouple Non-parametric Knowledge Distillation For End-to-end Speech
Translation [5.973321003365441]
We propose Decoupled Non-parametric Knowledge Distillation (DNKD) from data perspective to improve the data efficiency.
Our method follows the knowledge distillation paradigm. However, instead of obtaining the teacher distribution from a sophisticated MT model, we construct it from a non-Nearest datastore.
Experiments on MuST-C corpus show that, the proposed method can achieve consistent improvement over the strong baseline without requiring any transcription.
arXiv Detail & Related papers (2023-04-20T13:20:03Z) - On the Pareto Front of Multilingual Neural Machine Translation [123.94355117635293]
We study how the performance of a given direction changes with its sampling ratio in Neural Machine Translation (MNMT)
We propose the Double Power Law to predict the unique performance trade-off front in MNMT.
In our experiments, it achieves better performance than temperature searching and gradient manipulation methods with only 1/5 to 1/2 of the total training budget.
arXiv Detail & Related papers (2023-04-06T16:49:19Z) - Non-Parametric Online Learning from Human Feedback for Neural Machine
Translation [54.96594148572804]
We study the problem of online learning with human feedback in the human-in-the-loop machine translation.
Previous methods require online model updating or additional translation memory networks to achieve high-quality performance.
We propose a novel non-parametric online learning method without changing the model structure.
arXiv Detail & Related papers (2021-09-23T04:26:15Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - Source and Target Bidirectional Knowledge Distillation for End-to-end
Speech Translation [88.78138830698173]
We focus on sequence-level knowledge distillation (SeqKD) from external text-based NMT models.
We train a bilingual E2E-ST model to predict paraphrased transcriptions as an auxiliary task with a single decoder.
arXiv Detail & Related papers (2021-04-13T19:00:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.