Extrapolating Multilingual Understanding Models as Multilingual
Generators
- URL: http://arxiv.org/abs/2305.13140v1
- Date: Mon, 22 May 2023 15:33:21 GMT
- Title: Extrapolating Multilingual Understanding Models as Multilingual
Generators
- Authors: Bohong Wu, Fei Yuan, Hai Zhao, Lei Li, Jingjing Xu
- Abstract summary: This paper explores methods to empower multilingual understanding models the generation abilities to get a unified model.
We propose a textbfSemantic-textbfGuided textbfAlignment-then-Denoising (SGA) approach to adapt an encoder to a multilingual generator with a small number of new parameters.
- Score: 82.1355802012414
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multilingual understanding models (or encoder-based), pre-trained via masked
language modeling, have achieved promising results on many language
understanding tasks (e.g., mBERT). However, these non-autoregressive (NAR)
models still struggle to generate high-quality texts compared with
autoregressive (AR) models. Considering that encoder-based models have the
advantage of efficient generation and self-correction abilities, this paper
explores methods to empower multilingual understanding models the generation
abilities to get a unified model. Specifically, we start from a multilingual
encoder (XLM-R) and propose a \textbf{S}emantic-\textbf{G}uided
\textbf{A}lignment-then-Denoising (SGA) approach to adapt an encoder to a
multilingual generator with a small number of new parameters. Experiments show
that the proposed approach is an effective adaption method, outperforming
widely-used initialization-based methods with gains of 9.4 BLEU on machine
translation, 8.1 Rouge-L on question generation, and 5.5 METEOR on story
generation on XLM-R$_{large}$. On the other hand, we observe that XLM-R is
still inferior to mBART in supervised settings despite better results on
zero-shot settings, indicating that more exploration is required to make
understanding models strong generators.
Related papers
- Enhancing Code Translation in Language Models with Few-Shot Learning via Retrieval-Augmented Generation [1.9726019592585404]
This paper introduces a novel approach that enhances code translation through Few-Shot Learning.
By leveraging a repository of existing code translations, we dynamically retrieve the most relevant examples to guide the model in translating new code segments.
Our method, based on Retrieval-Augmented Generation, substantially improves translation quality.
arXiv Detail & Related papers (2024-07-29T00:41:48Z) - On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based
Multilingual Model [49.81429697921861]
We study the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models.
We show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning.
arXiv Detail & Related papers (2023-11-14T00:43:33Z) - FiLM: Fill-in Language Models for Any-Order Generation [71.42044325886194]
Fill-in Language Model (FiLM) is a new language modeling approach that allows for flexible generation at any position without adhering to a specific generation order.
During inference, FiLM can seamlessly insert missing phrases, sentences, or paragraphs.
FiLM outperforms existing infilling methods that rely on left-to-right language models trained on rearranged text segments.
arXiv Detail & Related papers (2023-10-15T19:37:39Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z) - MEGA: Multilingual Evaluation of Generative AI [23.109803506475174]
Generative AI models have shown impressive performance on many Natural Language Processing tasks.
Most studies on generative LLMs have been restricted to English.
It is unclear how capable these models are at understanding and generating text in other languages.
arXiv Detail & Related papers (2023-03-22T13:03:10Z) - GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator [114.8954615026781]
We propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator.
GanLM is trained with two pre-training objectives: replaced token detection and replaced token denoising.
Experiments in language generation benchmarks show that GanLM with the powerful language understanding capability outperforms various strong pre-trained language models.
arXiv Detail & Related papers (2022-12-20T12:51:11Z) - mGPT: Few-Shot Learners Go Multilingual [1.4354798873010843]
This paper introduces two autoregressive GPT-like models with 1.3 billion and 13 billion parameters trained on 60 languages.
We reproduce the GPT-3 architecture using GPT-2 sources and the sparse attention mechanism.
The resulting models show performance on par with the recently released XGLM models by Facebook.
arXiv Detail & Related papers (2022-04-15T13:02:33Z) - Improving Massively Multilingual Neural Machine Translation and
Zero-Shot Translation [81.7786241489002]
Massively multilingual models for neural machine translation (NMT) are theoretically attractive, but often underperform bilingual models and deliver poor zero-shot translations.
We argue that multilingual NMT requires stronger modeling capacity to support language pairs with varying typological characteristics.
We propose random online backtranslation to enforce the translation of unseen training language pairs.
arXiv Detail & Related papers (2020-04-24T17:21:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.