Latent Diffusion for Language Generation
- URL: http://arxiv.org/abs/2212.09462v2
- Date: Tue, 7 Nov 2023 15:35:45 GMT
- Title: Latent Diffusion for Language Generation
- Authors: Justin Lovelace and Varsha Kishore and Chao Wan and Eliot Shekhtman
and Kilian Q. Weinberger
- Abstract summary: Recent attempts to adapt diffusion to language have presented diffusion as an alternative to existing language models.
We demonstrate that encoder-decoder language models can be utilized to efficiently learn high-quality language autoencoders.
We validate the effectiveness of our approach for unconditional, class-conditional, and sequence-to-sequence language generation.
- Score: 26.620353485679892
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have achieved great success in modeling continuous data
modalities such as images, audio, and video, but have seen limited use in
discrete domains such as language. Recent attempts to adapt diffusion to
language have presented diffusion as an alternative to existing pretrained
language models. We view diffusion and existing language models as
complementary. We demonstrate that encoder-decoder language models can be
utilized to efficiently learn high-quality language autoencoders. We then
demonstrate that continuous diffusion models can be learned in the latent space
of the language autoencoder, enabling us to sample continuous latent
representations that can be decoded into natural language with the pretrained
decoder. We validate the effectiveness of our approach for unconditional,
class-conditional, and sequence-to-sequence language generation. We demonstrate
across multiple diverse data sets that our latent language diffusion models are
significantly more effective than previous diffusion language models.
Related papers
- Simple and Effective Masked Diffusion Language Models [48.68198363304619]
We show that simple masked discrete diffusion is more performant than previously thought.
We apply an effective training recipe that improves the performance of masked diffusion models.
Our objective has a simple form -- it is a mixture of classical masked language modeling losses.
arXiv Detail & Related papers (2024-06-11T17:51:40Z) - Diffusion Language Models Can Perform Many Tasks with Scaling and
Instruction-Finetuning [56.03057119008865]
We show that scaling diffusion language models can effectively make them strong language learners.
We build competent diffusion language models at scale by first acquiring knowledge from massive data.
Experiments show that scaling diffusion language models consistently improves performance across downstream language tasks.
arXiv Detail & Related papers (2023-08-23T16:01:12Z) - A Cheaper and Better Diffusion Language Model with Soft-Masked Noise [62.719656543880596]
Masked-Diffuse LM is a novel diffusion model for language modeling, inspired by linguistic features in languages.
Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data.
We demonstrate that our Masked-Diffuse LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency.
arXiv Detail & Related papers (2023-04-10T17:58:42Z) - DiffusionBERT: Improving Generative Masked Language Models with
Diffusion Models [81.84866217721361]
DiffusionBERT is a new generative masked language model based on discrete diffusion models.
We propose a new noise schedule for the forward diffusion process that controls the degree of noise added at each step.
Experiments on unconditional text generation demonstrate that DiffusionBERT achieves significant improvement over existing diffusion models for text.
arXiv Detail & Related papers (2022-11-28T03:25:49Z) - Self-conditioned Embedding Diffusion for Text Generation [28.342735885752493]
Self-conditioned Embedding Diffusion is a continuous diffusion mechanism that operates on token embeddings.
We show that our text diffusion models generate samples comparable with those produced by standard autoregressive language models.
arXiv Detail & Related papers (2022-11-08T13:30:27Z) - Language Models are General-Purpose Interfaces [109.45478241369655]
We propose to use language models as a general-purpose interface to various foundation models.
A collection of pretrained encoders perceive diverse modalities (such as vision, and language)
We propose a semi-causal language modeling objective to jointly pretrain the interface and the modular encoders.
arXiv Detail & Related papers (2022-06-13T17:34:22Z) - Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language
Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements.
We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations.
Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.