Guided Transfer Learning for Discrete Diffusion Models
- URL: http://arxiv.org/abs/2512.10877v1
- Date: Thu, 11 Dec 2025 18:05:55 GMT
- Title: Guided Transfer Learning for Discrete Diffusion Models
- Authors: Julian Kleutgens, Claudio Battiloro, Lingkai Kong, Benjamin Grewe, Francesca Dominici, Mauricio Tec,
- Abstract summary: We present Guided Transfer Learning for discrete diffusion models (GTL)<n>GTL enables sampling from a target distribution without modifying the pretrained denoiser.<n>We also present an efficient guided sampler that concentrates evaluations on planner-selected positions and top candidate tokens.
- Score: 21.909689920217982
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Discrete diffusion models achieve strong performance across language and other discrete domains, providing a powerful alternative to autoregressive models. However, their strong performance relies on large training datasets, which are costly or risky to obtain, especially when adapting to new domains. Transfer learning is the natural way to adapt pretrained discrete diffusion models, but current methods require fine-tuning large diffusion models, which is computationally expensive and often impractical. Building on ratio-based transfer learning for continuous diffusion, we provide Guided Transfer Learning for discrete diffusion models (GTL). This enables sampling from a target distribution without modifying the pretrained denoiser. The same guidance formulation applies to both discrete-time diffusion and continuous-time score-based discrete diffusion, yielding a unified treatment. Guided discrete diffusion often requires many forward passes of the guidance network, which becomes impractical for large vocabularies and long sequences. To address this, we further present an efficient guided sampler that concentrates evaluations on planner-selected positions and top candidate tokens, thus lowering sampling time and computation. This makes guided language modeling practical at scale for large vocabularies and long sequences. We evaluate GTL on sequential data, including synthetic Markov chains and language modeling, and provide empirical analyses of its behavior.
Related papers
- Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model a Latent Reasoner [66.86440230599656]
We argue that diffusion language models do not necessarily need to be in the discrete space.<n>In particular, we prove that continuous diffusion models have stronger expressivity than discrete diffusions and looped transformers.<n>We propose Coevolutionary Continuous Diffusion (CCDD), which defines a joint multimodal diffusion process on the union of a continuous representation space and a discrete token space.
arXiv Detail & Related papers (2025-10-03T17:44:41Z) - Continuous Diffusion Model for Language Modeling [64.7425225935854]
Existing continuous diffusion models for discrete data underperform compared to discrete methods.<n>We propose a continuous diffusion model for language modeling that incorporates the geometry of the underlying categorical distribution.<n>Our method outperforms existing discrete diffusion models and approaches the performance of autoregressive models.
arXiv Detail & Related papers (2025-02-17T08:54:29Z) - Constrained Diffusion Models via Dual Training [80.03953599062365]
Diffusion processes are prone to generating samples that reflect biases in a training dataset.
We develop constrained diffusion models by imposing diffusion constraints based on desired distributions.
We show that our constrained diffusion models generate new data from a mixture data distribution that achieves the optimal trade-off among objective and constraints.
arXiv Detail & Related papers (2024-08-27T14:25:42Z) - MG-TSD: Multi-Granularity Time Series Diffusion Models with Guided Learning Process [26.661721555671626]
We introduce a novel Multi-Granularity Time Series (MG-TSD) model, which achieves state-of-the-art predictive performance.
Our approach does not rely on additional external data, making it versatile and applicable across various domains.
arXiv Detail & Related papers (2024-03-09T01:15:03Z) - Fast Sampling via Discrete Non-Markov Diffusion Models with Predetermined Transition Time [49.598085130313514]
We propose discrete non-Markov diffusion models (DNDM), which naturally induce the predetermined transition time set.<n>This enables a training-free sampling algorithm that significantly reduces the number of function evaluations.<n>We study the transition from finite to infinite step sampling, offering new insights into bridging the gap between discrete and continuous-time processes.
arXiv Detail & Related papers (2023-12-14T18:14:11Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - How Much is Enough? A Study on Diffusion Times in Score-based Generative
Models [76.76860707897413]
Current best practice advocates for a large T to ensure that the forward dynamics brings the diffusion sufficiently close to a known and simple noise distribution.
We show how an auxiliary model can be used to bridge the gap between the ideal and the simulated forward dynamics, followed by a standard reverse diffusion process.
arXiv Detail & Related papers (2022-06-10T15:09:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.