Related papers: IDLM: Inverse-distilled Diffusion Language Models

IDLM: Inverse-distilled Diffusion Language Models

URL: http://arxiv.org/abs/2602.19066v1
Date: Sun, 22 Feb 2026 06:47:04 GMT
Title: IDLM: Inverse-distilled Diffusion Language Models
Authors: David Li, Nikita Gushchin, Dmitry Abulkhanov, Eric Moulines, Ivan Oseledets, Maxim Panov, Alexander Korotin,
Abstract summary: We extend Inverse Distillation, a technique originally developed to accelerate continuous diffusion models, to the discrete setting.<n>From a theoretical perspective, the inverse distillation objective lacks uniqueness guarantees, which may lead to suboptimal solutions.<n>We show that our method, Inverse-distilled Diffusion Language Models (IDLM), reduces the number of inference steps by 4x-64x.
Score: 70.5793829229702
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion Language Models (DLMs) have recently achieved strong results in text generation. However, their multi-step sampling leads to slow inference, limiting practical use. To address this, we extend Inverse Distillation, a technique originally developed to accelerate continuous diffusion models, to the discrete setting. Nonetheless, this extension introduces both theoretical and practical challenges. From a theoretical perspective, the inverse distillation objective lacks uniqueness guarantees, which may lead to suboptimal solutions. From a practical standpoint, backpropagation in the discrete space is non-trivial and often unstable. To overcome these challenges, we first provide a theoretical result demonstrating that our inverse formulation admits a unique solution, thereby ensuring valid optimization. We then introduce gradient-stable relaxations to support effective training. As a result, experiments on multiple DLMs show that our method, Inverse-distilled Diffusion Language Models (IDLM), reduces the number of inference steps by 4x-64x, while preserving the teacher model's entropy and generative perplexity.

Related papers

Constrained Diffusion with Trust Sampling [11.354281911272864]
We rethink training-free loss-guided diffusion from an optimization perspective. Trust sampling effectively balances following the unconditional diffusion model and adhering to the loss guidance. We demonstrate the efficacy of our method through extensive experiments on complex tasks, and in drastically different domains of images and 3D motion generation.
arXiv Detail & Related papers (2024-11-17T01:34:57Z)
G2D2: Gradient-Guided Discrete Diffusion for Inverse Problem Solving [83.56510119503267]
This paper presents a novel method for addressing linear inverse problems by leveraging generative models based on discrete diffusion as priors.<n>We employ a star-shaped noise process to mitigate the drawbacks of traditional discrete diffusion models with absorbing states.
arXiv Detail & Related papers (2024-10-09T06:18:25Z)
Unleashing the Power of One-Step Diffusion based Image Super-Resolution via a Large-Scale Diffusion Discriminator [81.81748032199813]
Diffusion models have demonstrated excellent performance for real-world image super-resolution (Real-ISR)<n>We propose a new One-Step textbfDiffusion model with a larger-scale textbfDiscriminator for SR.<n>Our discriminator is able to distill noisy features from any time step of diffusion models in the latent space.
arXiv Detail & Related papers (2024-10-05T16:41:36Z)
Repulsive Latent Score Distillation for Solving Inverse Problems [31.255943277671893]
Score Distillation Sampling (SDS) has been pivotal for leveraging pre-trained diffusion models in downstream tasks such as inverse problems. We introduce a novel variational framework for posterior sampling to address mode collapse and latent space inversion. We extend this framework with an augmented variational distribution that disentangles the latent and data.
arXiv Detail & Related papers (2024-06-24T14:43:02Z)
Variational Distillation of Diffusion Policies into Mixture of Experts [26.315682445979302]
This work introduces Variational Diffusion Distillation (VDD), a novel method that distills denoising diffusion policies into Mixtures of Experts (MoE) Diffusion Models are the current state-of-the-art in generative modeling due to their exceptional ability to accurately learn and represent complex, multi-modal distributions. VDD is the first method that distills pre-trained diffusion models into MoE models, and hence, combines the expressiveness of Diffusion Models with the benefits of Mixture Models.
arXiv Detail & Related papers (2024-06-18T12:15:05Z)
Distilling Diffusion Models into Conditional GANs [90.76040478677609]
We distill a complex multistep diffusion model into a single-step conditional GAN student model. For efficient regression loss, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space. We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models.
arXiv Detail & Related papers (2024-05-09T17:59:40Z)
SinSR: Diffusion-Based Image Super-Resolution in a Single Step [119.18813219518042]
Super-resolution (SR) methods based on diffusion models exhibit promising results. But their practical application is hindered by the substantial number of required inference steps. We propose a simple yet effective method for achieving single-step SR generation, named SinSR.
arXiv Detail & Related papers (2023-11-23T16:21:29Z)
Lipschitz Singularities in Diffusion Models [64.28196620345808]
Diffusion models often display the infinite Lipschitz property of the network with respect to time variable near the zero point.<n>We propose a novel approach, dubbed E-TSDM, which alleviates the Lipschitz singularities of the diffusion model near the zero point.<n>Our work may advance the understanding of the general diffusion process, and also provide insights for the design of diffusion models.
arXiv Detail & Related papers (2023-06-20T03:05:28Z)
Improving Diffusion Models for Inverse Problems using Manifold Constraints [55.91148172752894]
We show that current solvers throw the sample path off the data manifold, and hence the error accumulates. To address this, we propose an additional correction term inspired by the manifold constraint. We show that our method is superior to the previous methods both theoretically and empirically.
arXiv Detail & Related papers (2022-06-02T09:06:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.