Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space
- URL: http://arxiv.org/abs/2004.04092v4
- Date: Sun, 11 Oct 2020 23:33:10 GMT
- Title: Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space
- Authors: Chunyuan Li, Xiang Gao, Yuan Li, Baolin Peng, Xiujun Li, Yizhe Zhang,
Jianfeng Gao
- Abstract summary: Variational Autoencoder (VAE) can be both a powerful generative model and an effective representation learning framework for natural language.
In this paper, we propose the first large-scale language VAE model, Optimus.
- Score: 109.79957125584252
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When trained effectively, the Variational Autoencoder (VAE) can be both a
powerful generative model and an effective representation learning framework
for natural language. In this paper, we propose the first large-scale language
VAE model, Optimus. A universal latent embedding space for sentences is first
pre-trained on large text corpus, and then fine-tuned for various language
generation and understanding tasks. Compared with GPT-2, Optimus enables guided
language generation from an abstract level using the latent vectors. Compared
with BERT, Optimus can generalize better on low-resource language understanding
tasks due to the smooth latent space structure. Extensive experimental results
on a wide range of language tasks demonstrate the effectiveness of Optimus. It
achieves new state-of-the-art on VAE language modeling benchmarks. We hope that
our first pre-trained big VAE language model itself and results can help the
NLP community renew the interests of deep generative models in the era of
large-scale pre-training, and make these principled methods more practical.
Related papers
- BatGPT: A Bidirectional Autoregessive Talker from Generative Pre-trained
Transformer [77.28871523946418]
BatGPT is a large-scale language model designed and trained jointly by Wuhan University and Shanghai Jiao Tong University.
It is capable of generating highly natural and fluent text in response to various types of input, including text prompts, images, and audio.
arXiv Detail & Related papers (2023-07-01T15:10:01Z) - Bag of Tricks for Effective Language Model Pretraining and Downstream
Adaptation: A Case Study on GLUE [93.98660272309974]
This report briefly describes our submission Vega v1 on the General Language Understanding Evaluation leaderboard.
GLUE is a collection of nine natural language understanding tasks, including question answering, linguistic acceptability, sentiment analysis, text similarity, paraphrase detection, and natural language inference.
With our optimized pretraining and fine-tuning strategies, our 1.3 billion model sets new state-of-the-art on 4/9 tasks, achieving the best average score of 91.3.
arXiv Detail & Related papers (2023-02-18T09:26:35Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Pre-Trained Language Models for Interactive Decision-Making [72.77825666035203]
We describe a framework for imitation learning in which goals and observations are represented as a sequence of embeddings.
We demonstrate that this framework enables effective generalization across different environments.
For test tasks involving novel goals or novel scenes, initializing policies with language models improves task completion rates by 43.6%.
arXiv Detail & Related papers (2022-02-03T18:55:52Z) - TunBERT: Pretrained Contextualized Text Representation for Tunisian
Dialect [0.0]
We investigate the feasibility of training monolingual Transformer-based language models for under represented languages.
We show that the use of noisy web crawled data instead of structured data is more convenient for such non-standardized language.
Our best performing TunBERT model reaches or improves the state-of-the-art in all three downstream tasks.
arXiv Detail & Related papers (2021-11-25T15:49:50Z) - OptAGAN: Entropy-based finetuning on text VAE-GAN [1.941730292017383]
Recently Optimus, a variational autoencoder (VAE) has been released.
It combines two pre-trained models, BERT and GPT-2.
It has been shown to produce novel, yet very human-looking text.
arXiv Detail & Related papers (2021-09-01T08:23:19Z) - Differentiable Prompt Makes Pre-trained Language Models Better Few-shot
Learners [23.150999852147283]
This study proposes a novel pluggable, and efficient approach named DifferentiAble pRompT (DART)
It can convert small language models into better few-shot learners without any prompt engineering.
A comprehensive evaluation of standard NLP tasks demonstrates that the proposed approach achieves a better few-shot performance.
arXiv Detail & Related papers (2021-08-30T12:29:25Z) - Pre-training Universal Language Representation [46.51685959045527]
This work introduces universal language representation learning, i.e., embeddings of different levels of linguistic units or text with quite diverse lengths in a uniform vector space.
We empirically verify that well designed pre-training scheme may effectively yield universal language representation.
arXiv Detail & Related papers (2021-05-30T09:29:01Z) - Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model.
We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.