Simplified and Generalized Masked Diffusion for Discrete Data
- URL: http://arxiv.org/abs/2406.04329v4
- Date: Thu, 16 Jan 2025 08:46:16 GMT
- Title: Simplified and Generalized Masked Diffusion for Discrete Data
- Authors: Jiaxin Shi, Kehang Han, Zhe Wang, Arnaud Doucet, Michalis K. Titsias,
- Abstract summary: Masked (or absorbing) diffusion is actively explored as an alternative to autoregressive models for generative modeling of discrete data.<n>In this work, we aim to provide a simple and general framework that unlocks the full potential of masked diffusion models.
- Score: 47.711583631408715
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Masked (or absorbing) diffusion is actively explored as an alternative to autoregressive models for generative modeling of discrete data. However, existing work in this area has been hindered by unnecessarily complex model formulations and unclear relationships between different perspectives, leading to suboptimal parameterization, training objectives, and ad hoc adjustments to counteract these issues. In this work, we aim to provide a simple and general framework that unlocks the full potential of masked diffusion models. We show that the continuous-time variational objective of masked diffusion models is a simple weighted integral of cross-entropy losses. Our framework also enables training generalized masked diffusion models with state-dependent masking schedules. When evaluated by perplexity, our models trained on OpenWebText surpass prior diffusion language models at GPT-2 scale and demonstrate superior performance on 4 out of 5 zero-shot language modeling tasks. Furthermore, our models vastly outperform previous discrete diffusion models on pixel-level image modeling, achieving 2.75 (CIFAR-10) and 3.40 (ImageNet 64x64) bits per dimension that are better than autoregressive models of similar sizes. Our code is available at https://github.com/google-deepmind/md4.
Related papers
- USP: Unified Self-Supervised Pretraining for Image Generation and Understanding [15.717333276867462]
Unified Self-supervised Pretraining (USP) is a framework that initializes diffusion models via masked latent modeling in a Variational Autoencoder (VAE) latent space.
USP achieves comparable performance in understanding tasks while significantly improving the convergence speed and generation quality of diffusion models.
arXiv Detail & Related papers (2025-03-08T09:01:03Z) - Remasking Discrete Diffusion Models with Inference-Time Scaling [12.593164604625384]
We introduce the remasking diffusion model (ReMDM) sampler, a method that can be applied to pretrained masked diffusion models in a principled way.
Most interestingly, ReMDM endows discrete diffusion with a form of inference-time compute scaling.
arXiv Detail & Related papers (2025-03-01T02:37:51Z) - One-for-More: Continual Diffusion Model for Anomaly Detection [61.12622458367425]
Anomaly detection methods utilize diffusion models to generate or reconstruct normal samples when given arbitrary anomaly images.
Our study found that the diffusion model suffers from severe faithfulness hallucination'' and catastrophic forgetting''
We propose a continual diffusion model that uses gradient projection to achieve stable continual learning.
arXiv Detail & Related papers (2025-02-27T07:47:27Z) - Model Integrity when Unlearning with T2I Diffusion Models [11.321968363411145]
We propose approximate Machine Unlearning algorithms to reduce the generation of specific types of images, characterized by samples from a forget distribution''
We then propose unlearning algorithms that demonstrate superior effectiveness in preserving model integrity compared to existing baselines.
arXiv Detail & Related papers (2024-11-04T13:15:28Z) - Aggregation of Multi Diffusion Models for Enhancing Learned Representations [4.126721111013567]
This paper introduces a novel algorithm, Aggregation of Multi Diffusion Models (AMDM)
AMDM synthesizes features from multiple diffusion models into a specified model, enhancing its learned representations to activate specific features for fine-grained control.
Experimental results demonstrate that AMDM significantly improves fine-grained control without additional training or inference time.
arXiv Detail & Related papers (2024-10-02T06:16:06Z) - Simple and Effective Masked Diffusion Language Models [48.68198363304619]
We show that simple masked discrete diffusion is more performant than previously thought.
Our objective has a simple form -- it is a mixture of classical masked language modeling losses.
On language modeling benchmarks, a range of masked diffusion models trained with modern engineering practices achieves a new state-of-the-art.
arXiv Detail & Related papers (2024-06-11T17:51:40Z) - FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets.
We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation.
Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z) - Adaptive Training Meets Progressive Scaling: Elevating Efficiency in Diffusion Models [52.1809084559048]
We propose a novel two-stage divide-and-conquer training strategy termed TDC Training.
It groups timesteps based on task similarity and difficulty, assigning highly customized denoising models to each group, thereby enhancing the performance of diffusion models.
While two-stage training avoids the need to train each model separately, the total training cost is even lower than training a single unified denoising model.
arXiv Detail & Related papers (2023-12-20T03:32:58Z) - Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution [67.9215891673174]
We propose score entropy as a novel loss that naturally extends score matching to discrete spaces.
We test our Score Entropy Discrete Diffusion models on standard language modeling tasks.
arXiv Detail & Related papers (2023-10-25T17:59:12Z) - Likelihood-Based Diffusion Language Models [13.916640262862215]
We take the first steps towards closing the likelihood gap between autoregressive and diffusion-based language models.
We pursue this goal through algorithmic improvements, scaling laws, and increased compute.
We release Plaid 1B, a large diffusion language model which outperforms GPT-2 124M in likelihood on benchmark datasets.
arXiv Detail & Related papers (2023-05-30T16:43:31Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - Diffusion Models as Masked Autoencoders [52.442717717898056]
We revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models.
While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE)
We perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.
arXiv Detail & Related papers (2023-04-06T17:59:56Z) - On Distillation of Guided Diffusion Models [94.95228078141626]
We propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from.
For standard diffusion models trained on the pixelspace, our approach is able to generate images visually comparable to that of the original model.
For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps.
arXiv Detail & Related papers (2022-10-06T18:03:56Z) - Diffusion Models: A Comprehensive Survey of Methods and Applications [10.557289965753437]
Diffusion models are a class of deep generative models that have shown impressive results on various tasks with dense theoretical founding.
Recent studies have shown great enthusiasm on improving the performance of diffusion model.
arXiv Detail & Related papers (2022-09-02T02:59:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.