XDLM: Cross-lingual Diffusion Language Model for Machine Translation
- URL: http://arxiv.org/abs/2307.13560v2
- Date: Mon, 31 Jul 2023 01:29:31 GMT
- Title: XDLM: Cross-lingual Diffusion Language Model for Machine Translation
- Authors: Linyao Chen, Aosong Feng, Boming Yang, Zihui Li
- Abstract summary: We propose a novel Cross-lingual diffusion model for machine translation, consisting of pretraining and fine-tuning stages.
We evaluate the result on several machine translation benchmarks and outperformed both diffusion and Transformer baselines.
- Score: 0.0
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Recently, diffusion models have excelled in image generation tasks and have
also been applied to neural language processing (NLP) for controllable text
generation. However, the application of diffusion models in a cross-lingual
setting is less unexplored. Additionally, while pretraining with diffusion
models has been studied within a single language, the potential of
cross-lingual pretraining remains understudied. To address these gaps, we
propose XDLM, a novel Cross-lingual diffusion model for machine translation,
consisting of pretraining and fine-tuning stages. In the pretraining stage, we
propose TLDM, a new training objective for mastering the mapping between
different languages; in the fine-tuning stage, we build up the translation
system based on the pretrained model. We evaluate the result on several machine
translation benchmarks and outperformed both diffusion and Transformer
baselines.
Related papers
- Revisiting Machine Translation for Cross-lingual Classification [91.43729067874503]
Most research in the area focuses on the multilingual models rather than the Machine Translation component.
We show that, by using a stronger MT system and mitigating the mismatch between training on original text and running inference on machine translated text, translate-test can do substantially better than previously assumed.
arXiv Detail & Related papers (2023-05-23T16:56:10Z) - DiffusionBERT: Improving Generative Masked Language Models with
Diffusion Models [81.84866217721361]
DiffusionBERT is a new generative masked language model based on discrete diffusion models.
We propose a new noise schedule for the forward diffusion process that controls the degree of noise added at each step.
Experiments on unconditional text generation demonstrate that DiffusionBERT achieves significant improvement over existing diffusion models for text.
arXiv Detail & Related papers (2022-11-28T03:25:49Z) - MSP: Multi-Stage Prompting for Making Pre-trained Language Models Better
Translators [10.557167523009392]
We present Multi-Stage Prompting, a simple and lightweight approach for better adapting pre-trained language models to translation tasks.
To make pre-trained language models better translators, we divide the translation process via pre-trained language models into three separate stages.
During each stage, we independently apply different continuous prompts for allowing pre-trained language models better adapting to translation tasks.
arXiv Detail & Related papers (2021-10-13T10:06:21Z) - Cross-lingual Transferring of Pre-trained Contextualized Language Models [73.97131976850424]
We propose a novel cross-lingual model transferring framework for PrLMs: TreLM.
To handle the symbol order and sequence length differences between languages, we propose an intermediate TRILayer" structure.
We show the proposed framework significantly outperforms language models trained from scratch with limited data in both performance and efficiency.
arXiv Detail & Related papers (2021-07-27T06:51:13Z) - Improving the Lexical Ability of Pretrained Language Models for
Unsupervised Neural Machine Translation [127.81351683335143]
Cross-lingual pretraining requires models to align the lexical- and high-level representations of the two languages.
Previous research has shown that this is because the representations are not sufficiently aligned.
In this paper, we enhance the bilingual masked language model pretraining with lexical-level information by using type-level cross-lingual subword embeddings.
arXiv Detail & Related papers (2021-03-18T21:17:58Z) - ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual
Semantics with Monolingual Corpora [21.78571365050787]
ERNIE-M is a new training method that encourages the model to align the representation of multiple languages with monolingual corpora.
We generate pseudo-parallel sentences pairs on a monolingual corpus to enable the learning of semantic alignment between different languages.
Experimental results show that ERNIE-M outperforms existing cross-lingual models and delivers new state-of-the-art results on various cross-lingual downstream tasks.
arXiv Detail & Related papers (2020-12-31T15:52:27Z) - Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language
Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements.
We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations.
Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.