Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation
- URL: http://arxiv.org/abs/2505.18853v1
- Date: Sat, 24 May 2025 20:02:14 GMT
- Title: Smoothie: Smoothing Diffusion on Token Embeddings for Text Generation
- Authors: Alexander Shabalin, Viacheslav Meshchaninov, Dmitry Vetrov,
- Abstract summary: Smoothing Diffusion on Token Embeddings (Smoothie) is a novel diffusion method that combines the strengths of both approaches by progressively smoothing token embeddings based on semantic similarity.<n> Experimental results on several sequence-to-sequence generation tasks demonstrate that Smoothie outperforms existing diffusion-based models in generation quality.<n>Our proposed diffusion space yields better performance than both the standard embedding space and the categorical simplex.
- Score: 45.560812800359685
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have achieved state-of-the-art performance in generating images, audio, and video, but their adaptation to text remains challenging due to its discrete nature. Prior approaches either apply Gaussian diffusion in continuous latent spaces, which inherits semantic structure but struggles with token decoding, or operate in categorical simplex space, which respect discreteness but disregard semantic relation between tokens. In this paper, we propose Smoothing Diffusion on Token Embeddings (Smoothie), a novel diffusion method that combines the strengths of both approaches by progressively smoothing token embeddings based on semantic similarity. This technique enables gradual information removal while maintaining a natural decoding process. Experimental results on several sequence-to-sequence generation tasks demonstrate that Smoothie outperforms existing diffusion-based models in generation quality. Furthermore, ablation studies show that our proposed diffusion space yields better performance than both the standard embedding space and the categorical simplex. Our code is available at https://github.com/ashaba1in/smoothie.
Related papers
- Continuously Augmented Discrete Diffusion model for Categorical Generative Modeling [87.34677262370924]
Standard discrete diffusion models treat all unobserved states identically by mapping them to an absorbing [MASK] token.<n>This creates an 'information void' where semantic information that could be inferred from unmasked tokens is lost between denoising steps.<n>We introduce Continuously Augmented Discrete Diffusion, a framework that augments the discrete state space with a paired diffusion in a continuous latent space.
arXiv Detail & Related papers (2025-10-01T18:00:56Z) - Unifying Continuous and Discrete Text Diffusion with Non-simultaneous Diffusion Processes [9.29387855908007]
NeoDiff is a novel diffusion model that integrates the strengths of both discrete and continuous approaches.<n>Our approach unifies the theories of discrete and continuous diffusion models, offering a more principled and effective framework for text generation.
arXiv Detail & Related papers (2025-05-28T09:28:52Z) - Generalized Interpolating Discrete Diffusion [65.74168524007484]
Masked diffusion is a popular choice due to its simplicity and effectiveness.<n>We derive the theoretical backbone of a family of general interpolating discrete diffusion processes.<n>Exploiting GIDD's flexibility, we explore a hybrid approach combining masking and uniform noise.
arXiv Detail & Related papers (2025-03-06T14:30:55Z) - G2D2: Gradient-Guided Discrete Diffusion for Inverse Problem Solving [83.56510119503267]
This paper presents a novel method for addressing linear inverse problems by leveraging generative models based on discrete diffusion as priors.<n>We employ a star-shaped noise process to mitigate the drawbacks of traditional discrete diffusion models with absorbing states.
arXiv Detail & Related papers (2024-10-09T06:18:25Z) - Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding [84.3224556294803]
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences.
We aim to optimize downstream reward functions while preserving the naturalness of these design spaces.
Our algorithm integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future.
arXiv Detail & Related papers (2024-08-15T16:47:59Z) - Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models [82.8261101680427]
Smooth latent spaces ensure that a perturbation on an input latent corresponds to a steady change in the output image.
This property proves beneficial in downstream tasks, including image inversion, inversion, and editing.
We propose Smooth Diffusion, a new category of diffusion models that can be simultaneously high-performing and smooth.
arXiv Detail & Related papers (2023-12-07T16:26:23Z) - TESS: Text-to-Text Self-Conditioned Simplex Diffusion [56.881170312435444]
Text-to-text Self-conditioned Simplex Diffusion employs a new form of self-conditioning, and applies the diffusion process on the logit simplex space rather than the learned embedding space.
We demonstrate that TESS outperforms state-of-the-art non-autoregressive models, requires fewer diffusion steps with minimal drop in performance, and is competitive with pretrained autoregressive sequence-to-sequence models.
arXiv Detail & Related papers (2023-05-15T06:33:45Z) - A Cheaper and Better Diffusion Language Model with Soft-Masked Noise [62.719656543880596]
Masked-Diffuse LM is a novel diffusion model for language modeling, inspired by linguistic features in languages.
Specifically, we design a linguistic-informed forward process which adds corruptions to the text through strategically soft-masking to better noise the textual data.
We demonstrate that our Masked-Diffuse LM can achieve better generation quality than the state-of-the-art diffusion models with better efficiency.
arXiv Detail & Related papers (2023-04-10T17:58:42Z) - SeqDiffuSeq: Text Diffusion with Encoder-Decoder Transformers [50.90457644954857]
In this work, we apply diffusion models to approach sequence-to-sequence text generation.
We propose SeqDiffuSeq, a text diffusion model for sequence-to-sequence generation.
Experiment results illustrate the good performance on sequence-to-sequence generation in terms of text quality and inference time.
arXiv Detail & Related papers (2022-12-20T15:16:24Z) - Empowering Diffusion Models on the Embedding Space for Text Generation [38.664533078347304]
We study the optimization challenges encountered with both the embedding space and the denoising model.
Data distribution is learnable for embeddings, which may lead to the collapse of the embedding space and unstable training.
Based on the above analysis, we propose Difformer, an embedding diffusion model based on Transformer.
arXiv Detail & Related papers (2022-12-19T12:44:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.