Analog Bits: Generating Discrete Data using Diffusion Models with
Self-Conditioning
- URL: http://arxiv.org/abs/2208.04202v1
- Date: Mon, 8 Aug 2022 15:08:40 GMT
- Title: Analog Bits: Generating Discrete Data using Diffusion Models with
Self-Conditioning
- Authors: Ting Chen, Ruixiang Zhang, Geoffrey Hinton
- Abstract summary: Bit Diffusion is a generic approach for generating discrete data with continuous diffusion models.
The proposed approach can achieve strong performance in both discrete image generation and image captioning tasks.
For image captioning on MS-COCO dataset, our approach achieves competitive results compared to autoregressive models.
- Score: 90.02873747873444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Bit Diffusion: a simple and generic approach for generating
discrete data with continuous diffusion models. The main idea behind our
approach is to first represent the discrete data as binary bits, and then train
a continuous diffusion model to model these bits as real numbers which we call
analog bits. To generate samples, the model first generates the analog bits,
which are then thresholded to obtain the bits that represent the discrete
variables. We further propose two simple techniques, namely Self-Conditioning
and Asymmetric Time Intervals, which lead to a significant improvement in
sample quality. Despite its simplicity, the proposed approach can achieve
strong performance in both discrete image generation and image captioning
tasks. For discrete image generation, we significantly improve previous
state-of-the-art on both CIFAR-10 (which has 3K discrete 8-bit tokens) and
ImageNet-64x64 (which has 12K discrete 8-bit tokens), outperforming the best
autoregressive model in both sample quality (measured by FID) and efficiency.
For image captioning on MS-COCO dataset, our approach achieves competitive
results compared to autoregressive models.
Related papers
- Glauber Generative Model: Discrete Diffusion Models via Binary Classification [21.816933208895843]
We introduce the Glauber Generative Model (GGM), a new class of discrete diffusion models.
GGM deploys a Markov chain to denoise a sequence of noisy tokens to a sample from a joint distribution of discrete tokens.
We show that it outperforms existing discrete diffusion models in language generation and image generation.
arXiv Detail & Related papers (2024-05-27T10:42:13Z) - Scaling and Masking: A New Paradigm of Data Sampling for Image and Video
Quality Assessment [24.545341041444797]
Quality assessment of images and videos emphasizes both local details and global semantics, whereas general data sampling methods fail to catch them simultaneously.
In this work, instead of stacking up models, a more elegant data sampling method is explored, which compacts both the local and global content in a regular input size.
Experiments show that our sampling method can improve the performance of current single-branch models significantly, and achieves competitive performance to the multi-branch models without extra model complexity.
arXiv Detail & Related papers (2024-01-05T03:12:03Z) - Decoupled Diffusion Models: Simultaneous Image to Zero and Zero to Noise [53.04220377034574]
We propose decoupled diffusion models (DDMs) for high-quality (un)conditioned image generation in less than 10 function evaluations.
We mathematically derive 1) the training objectives and 2) for the reverse time the sampling formula based on an analytic transition probability which models image to zero transition.
We experimentally yield very competitive performance compared with the state of the art in 1) unconditioned image generation, textite.g., CIFAR-10 and CelebA-HQ-256 and 2) image-conditioned downstream tasks such as super-resolution, saliency detection, edge detection, and image in
arXiv Detail & Related papers (2023-06-23T18:08:00Z) - Consistency Models [89.68380014789861]
We propose a new family of models that generate high quality samples by directly mapping noise to data.
They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality.
They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training.
arXiv Detail & Related papers (2023-03-02T18:30:16Z) - DEff-GAN: Diverse Attribute Transfer for Few-Shot Image Synthesis [0.38073142980733]
We extend the single-image GAN method to model multiple images for sample synthesis.
Our Data-Efficient GAN (DEff-GAN) generates excellent results when similarities and correspondences can be drawn between the input images or classes.
arXiv Detail & Related papers (2023-02-28T12:43:52Z) - Fast Sampling of Diffusion Models via Operator Learning [74.37531458470086]
We use neural operators, an efficient method to solve the probability flow differential equations, to accelerate the sampling process of diffusion models.
Compared to other fast sampling methods that have a sequential nature, we are the first to propose a parallel decoding method.
We show our method achieves state-of-the-art FID of 3.78 for CIFAR-10 and 7.83 for ImageNet-64 in the one-model-evaluation setting.
arXiv Detail & Related papers (2022-11-24T07:30:27Z) - On Distillation of Guided Diffusion Models [94.95228078141626]
We propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from.
For standard diffusion models trained on the pixelspace, our approach is able to generate images visually comparable to that of the original model.
For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps.
arXiv Detail & Related papers (2022-10-06T18:03:56Z) - Lossy Image Compression with Conditional Diffusion Models [25.158390422252097]
This paper outlines an end-to-end optimized lossy image compression framework using diffusion generative models.
In contrast to VAE-based neural compression, where the (mean) decoder is a deterministic neural network, our decoder is a conditional diffusion model.
Our approach yields stronger reported FID scores than the GAN-based model, while also yielding competitive performance with VAE-based models in several distortion metrics.
arXiv Detail & Related papers (2022-09-14T21:53:27Z) - Locally Masked Convolution for Autoregressive Models [107.4635841204146]
LMConv is a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image.
We learn an ensemble of distribution estimators that share parameters but differ in generation order, achieving improved performance on whole-image density estimation.
arXiv Detail & Related papers (2020-06-22T17:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.