Related papers: T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization

T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization

URL: http://arxiv.org/abs/2602.12262v2
Date: Fri, 13 Feb 2026 04:16:09 GMT
Title: T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization
Authors: Tunyu Zhang, Xinxi Zhang, Ligong Han, Haizhou Shi, Xiaoxiao He, Zhuowei Li, Hao Wang, Kai Xu, Akash Srivastava, Hao Wang, Vladimir Pavlovic, Dimitris N. Metaxas,
Abstract summary: Diffusion large language models (DLLMs) have the potential to enable fast text generation by decoding multiple tokens in parallel.<n>We propose a trajectory self-distillation framework that improves few-step decoding by distilling the model's own generative trajectories.<n>Our approach consistently outperforms strong few-step baselines and standard training under tight step budgets.
Score: 45.026481622387244
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Diffusion large language models (DLLMs) have the potential to enable fast text generation by decoding multiple tokens in parallel. However, in practice, their inference efficiency is constrained by the need for many refinement steps, while aggressively reducing the number of steps leads to a substantial degradation in generation quality. To alleviate this, we propose a trajectory self-distillation framework that improves few-step decoding by distilling the model's own generative trajectories. We incorporate Direct Discriminative Optimization (DDO), a reverse-KL objective that promotes mode-seeking distillation and encourages the student to concentrate on high-probability teacher modes. Across benchmarks, our approach consistently outperforms strong few-step baselines and standard training under tight step budgets. Although full-step decoding remains superior, we substantially narrow the gap, establishing a strong foundation towards practical few-step DLLMs. The source code is available at https://github.com/Tyrion58/T3D.

Related papers

d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation [31.922313594074925]
Diffusion large language models (dLLMs) offer capabilities beyond those of autoregressive (AR) LLMs.<n>Current methods typically focus on only one-side of the coin, targeting either efficiency or performance.<n>We propose d3LLM (Pseudo-Distilled Diffusion Large Language Model), striking a balance between accuracy and parallelism.
arXiv Detail & Related papers (2026-01-12T14:25:36Z)
CD4LM: Consistency Distillation and aDaptive Decoding for Diffusion Language Models [27.070045950001532]
CD4LM is a framework that decouples training from inference.<n>On GSM8K, CD4LM matches the LLaDA baseline with a 5.18x wall-clock speedup.
arXiv Detail & Related papers (2026-01-05T16:09:22Z)
Continuous Autoregressive Language Models [56.49239051750678]
We introduce Continuous Autoregressive Language Models (CALM)<n>CALM uses a high-fidelity autoencoder to compress a chunk of K tokens into a single continuous vector.<n>We develop a comprehensive likelihood-free framework that enables robust training, evaluation, and controllable sampling.
arXiv Detail & Related papers (2025-10-31T17:58:11Z)
Accelerating Diffusion LLMs via Adaptive Parallel Decoding [60.407727995313074]
We introduce adaptive parallel decoding (APD), a novel method that dynamically adjusts the number of tokens sampled in parallel.<n>APD provides markedly higher throughput with minimal quality degradations on downstream benchmarks.
arXiv Detail & Related papers (2025-05-31T06:10:10Z)
Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding [53.82301522384719]
We propose Dimple, the first Discrete Multimodal Large Language Model (DMLLM)<n>We design a novel training paradigm that combines an initial autoregressive phase with a subsequent diffusion phase.<n>Dimple-7B surpasses LLaVA- in performance by 3.9%, demonstrating that DMLLM can achieve performance comparable to that of autoregressive models.
arXiv Detail & Related papers (2025-05-22T17:55:04Z)
Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies [7.14946066475415]
Speculative decoding (SD) methods offer substantial efficiency gains by generating multiple tokens using a single target forward pass.<n>Existing SD approaches require the drafter and target models to share the same vocabulary, thus limiting the pool of possible drafters.<n>We present three new SD methods that remove this shared-vocabulary constraint.<n>Our algorithms demonstrate significant speedups of up to 2.8x over standard autoregressive decoding.
arXiv Detail & Related papers (2025-01-31T19:13:58Z)
Unleashing the Power of One-Step Diffusion based Image Super-Resolution via a Large-Scale Diffusion Discriminator [81.81748032199813]
Diffusion models have demonstrated excellent performance for real-world image super-resolution (Real-ISR)<n>We propose a new One-Step textbfDiffusion model with a larger-scale textbfDiscriminator for SR.<n>Our discriminator is able to distill noisy features from any time step of diffusion models in the latent space.
arXiv Detail & Related papers (2024-10-05T16:41:36Z)
BOOT: Data-free Distillation of Denoising Diffusion Models with Bootstrapping [64.54271680071373]
Diffusion models have demonstrated excellent potential for generating diverse images. Knowledge distillation has been recently proposed as a remedy that can reduce the number of inference steps to one or a few. We present a novel technique called BOOT, that overcomes limitations with an efficient data-free distillation algorithm.
arXiv Detail & Related papers (2023-06-08T20:30:55Z)
Fast Point Cloud Generation with Straight Flows [44.76242251282731]
Point Straight Flow is a model that exhibits impressive performance using one step. We develop a distillation strategy to shorten the straight path into one step without a performance loss. We perform evaluations on multiple 3D tasks and find that our PSF performs comparably to the standard diffusion model.
arXiv Detail & Related papers (2022-12-04T06:10:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.