Analog Bits: Generating Discrete Data using Diffusion Models with
Self-Conditioning
- URL: http://arxiv.org/abs/2208.04202v1
- Date: Mon, 8 Aug 2022 15:08:40 GMT
- Title: Analog Bits: Generating Discrete Data using Diffusion Models with
Self-Conditioning
- Authors: Ting Chen, Ruixiang Zhang, Geoffrey Hinton
- Abstract summary: Bit Diffusion is a generic approach for generating discrete data with continuous diffusion models.
The proposed approach can achieve strong performance in both discrete image generation and image captioning tasks.
For image captioning on MS-COCO dataset, our approach achieves competitive results compared to autoregressive models.
- Score: 90.02873747873444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Bit Diffusion: a simple and generic approach for generating
discrete data with continuous diffusion models. The main idea behind our
approach is to first represent the discrete data as binary bits, and then train
a continuous diffusion model to model these bits as real numbers which we call
analog bits. To generate samples, the model first generates the analog bits,
which are then thresholded to obtain the bits that represent the discrete
variables. We further propose two simple techniques, namely Self-Conditioning
and Asymmetric Time Intervals, which lead to a significant improvement in
sample quality. Despite its simplicity, the proposed approach can achieve
strong performance in both discrete image generation and image captioning
tasks. For discrete image generation, we significantly improve previous
state-of-the-art on both CIFAR-10 (which has 3K discrete 8-bit tokens) and
ImageNet-64x64 (which has 12K discrete 8-bit tokens), outperforming the best
autoregressive model in both sample quality (measured by FID) and efficiency.
For image captioning on MS-COCO dataset, our approach achieves competitive
results compared to autoregressive models.
Related papers
- Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
Diffusion models have dominated the field of large, generative image models.
We propose an algorithm for fast-constrained sampling in large pre-trained diffusion models.
arXiv Detail & Related papers (2024-10-24T14:52:38Z) - Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model [101.65105730838346]
We introduce Transfusion, a recipe for training a multi-modal model over discrete and continuous data.
We pretrain multiple Transfusion models up to 7B parameters from scratch on a mixture of text and image data.
Our experiments show that Transfusion scales significantly better than quantizing images and training a language model over discrete image tokens.
arXiv Detail & Related papers (2024-08-20T17:48:20Z) - Glauber Generative Model: Discrete Diffusion Models via Binary Classification [21.816933208895843]
We introduce the Glauber Generative Model (GGM), a new class of discrete diffusion models.
GGM deploys a Markov chain to denoise a sequence of noisy tokens to a sample from a joint distribution of discrete tokens.
We show that it outperforms existing discrete diffusion models in language generation and image generation.
arXiv Detail & Related papers (2024-05-27T10:42:13Z) - Scaling and Masking: A New Paradigm of Data Sampling for Image and Video
Quality Assessment [24.545341041444797]
Quality assessment of images and videos emphasizes both local details and global semantics, whereas general data sampling methods fail to catch them simultaneously.
In this work, instead of stacking up models, a more elegant data sampling method is explored, which compacts both the local and global content in a regular input size.
Experiments show that our sampling method can improve the performance of current single-branch models significantly, and achieves competitive performance to the multi-branch models without extra model complexity.
arXiv Detail & Related papers (2024-01-05T03:12:03Z) - DEff-GAN: Diverse Attribute Transfer for Few-Shot Image Synthesis [0.38073142980733]
We extend the single-image GAN method to model multiple images for sample synthesis.
Our Data-Efficient GAN (DEff-GAN) generates excellent results when similarities and correspondences can be drawn between the input images or classes.
arXiv Detail & Related papers (2023-02-28T12:43:52Z) - Fast Sampling of Diffusion Models via Operator Learning [74.37531458470086]
We use neural operators, an efficient method to solve the probability flow differential equations, to accelerate the sampling process of diffusion models.
Compared to other fast sampling methods that have a sequential nature, we are the first to propose a parallel decoding method.
We show our method achieves state-of-the-art FID of 3.78 for CIFAR-10 and 7.83 for ImageNet-64 in the one-model-evaluation setting.
arXiv Detail & Related papers (2022-11-24T07:30:27Z) - On Distillation of Guided Diffusion Models [94.95228078141626]
We propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from.
For standard diffusion models trained on the pixelspace, our approach is able to generate images visually comparable to that of the original model.
For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps.
arXiv Detail & Related papers (2022-10-06T18:03:56Z) - Lossy Image Compression with Conditional Diffusion Models [25.158390422252097]
This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models.
In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model.
Our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics.
arXiv Detail & Related papers (2022-09-14T21:53:27Z) - Locally Masked Convolution for Autoregressive Models [107.4635841204146]
LMConv is a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image.
We learn an ensemble of distribution estimators that share parameters but differ in generation order, achieving improved performance on whole-image density estimation.
arXiv Detail & Related papers (2020-06-22T17:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.