Related papers: Continuous Diffusion Model for Language Modeling

Continuous Diffusion Model for Language Modeling

URL: http://arxiv.org/abs/2502.11564v1
Date: Mon, 17 Feb 2025 08:54:29 GMT
Title: Continuous Diffusion Model for Language Modeling
Authors: Jaehyeong Jo, Sung Ju Hwang,
Abstract summary: Existing continuous diffusion models for discrete data have limited performance compared to discrete approaches.<n>We propose a continuous diffusion model for language modeling that incorporates the geometry of the underlying categorical distribution.
Score: 57.396578974401734
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models have emerged as a promising alternative to autoregressive models in modeling discrete categorical data. Yet diffusion models that directly work on discrete data space do not fully exploit the power of iterative refinement, as the signals are lost during the transition between discrete states. Existing continuous diffusion models for discrete data have limited performance compared to discrete approaches, and the unclear link between them restricts the development of diffusion models for discrete data. In this work, we propose a continuous diffusion model for language modeling that incorporates the geometry of the underlying categorical distribution. We establish a connection between the discrete diffusion and continuous flow on the statistical manifold, and building on the analogy, we introduce a simple design for the diffusion process that generalizes previous discrete diffusion models. We further propose a simulation-free training framework based on radial symmetry and a simple technique to address the high dimensionality of the manifold. Comprehensive experiments on language modeling benchmarks and other modalities show that our method outperforms existing discrete diffusion models and approaches the performance of autoregressive models. Codes available at \href{https://github.com/harryjo97/RDLM}{https://github.com/harryjo97/RDLM}.

Related papers

Diffusion models for multivariate subsurface generation and efficient probabilistic inversion [0.0]
Diffusion models offer stable training and state-of-the-art performance for deep generative modeling tasks.<n>We introduce a likelihood approximation accounting for the noise-contamination that is inherent in diffusion modeling.<n>Our tests show significantly improved statistical robustness, enhanced sampling of the posterior probability density function.
arXiv Detail & Related papers (2025-07-21T17:10:16Z)
Graph Representation Learning with Diffusion Generative Models [0.0]
We train a discrete diffusion model within an autoencoder framework to learn meaningful embeddings for graph data. Our approach demonstrates the potential of discrete diffusion models to be used for graph representation learning.
arXiv Detail & Related papers (2025-01-22T07:12:10Z)
Energy-Based Diffusion Language Models for Text Generation [126.23425882687195]
Energy-based Diffusion Language Model (EDLM) is an energy-based model operating at the full sequence level for each diffusion step. Our framework offers a 1.3$times$ sampling speedup over existing diffusion models.
arXiv Detail & Related papers (2024-10-28T17:25:56Z)
Distillation of Discrete Diffusion through Dimensional Correlations [21.078500510691747]
"Mixture" models are capable of treating dimensional correlations while remaining scalable.<n>"Loss functions" enable the mixture models to distill such many-step conventional models into just a few steps by learning the dimensional correlations.<n>Our experimental results show the effectiveness of the proposed method in distilling pretrained discrete diffusion models across image and language domains.
arXiv Detail & Related papers (2024-10-11T10:53:03Z)
Discrete Copula Diffusion [44.96934660818884]
We identify a fundamental limitation that prevents discrete diffusion models from achieving strong performance with fewer steps. We introduce a general approach to supplement the missing dependency information by incorporating another deep generative model, termed the copula model. Our method does not require fine-tuning either the diffusion model or the copula model, yet it enables high-quality sample generation with significantly fewer denoising steps.
arXiv Detail & Related papers (2024-10-02T18:51:38Z)
Constrained Diffusion Models via Dual Training [80.03953599062365]
Diffusion processes are prone to generating samples that reflect biases in a training dataset. We develop constrained diffusion models by imposing diffusion constraints based on desired distributions. We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints.
arXiv Detail & Related papers (2024-08-27T14:25:42Z)
Convergence Analysis of Discrete Diffusion Model: Exact Implementation through Uniformization [17.535229185525353]
We introduce an algorithm leveraging the uniformization of continuous Markov chains, implementing transitions on random time points. Our results align with state-of-the-art achievements for diffusion models in $mathbbRd$ and further underscore the advantages of discrete diffusion models in comparison to the $mathbbRd$ setting.
arXiv Detail & Related papers (2024-02-12T22:26:52Z)
Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining. We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z)
Infinite-Dimensional Diffusion Models [4.342241136871849]
We formulate diffusion-based generative models in infinite dimensions and apply them to the generative modeling of functions. We show that our formulations are well posed in the infinite-dimensional setting and provide dimension-independent distance bounds from the sample to the target measure. We also develop guidelines for the design of infinite-dimensional diffusion models.
arXiv Detail & Related papers (2023-02-20T18:00:38Z)
Unifying Diffusion Models' Latent Space, with Applications to CycleDiffusion and Guidance [95.12230117950232]
We show that a common latent space emerges from two diffusion models trained independently on related domains. Applying CycleDiffusion to text-to-image diffusion models, we show that large-scale text-to-image diffusion models can be used as zero-shot image-to-image editors.
arXiv Detail & Related papers (2022-10-11T15:53:52Z)
Diffusion Models in Vision: A Survey [73.10116197883303]
A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and a reverse diffusion stage.<n> Diffusion models are widely appreciated for the quality and diversity of the generated samples, despite their known computational burdens.
arXiv Detail & Related papers (2022-09-10T22:00:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.