Binary Latent Diffusion
- URL: http://arxiv.org/abs/2304.04820v1
- Date: Mon, 10 Apr 2023 19:03:28 GMT
- Title: Binary Latent Diffusion
- Authors: Ze Wang, Jiang Wang, Zicheng Liu, and Qiang Qiu
- Abstract summary: We show that a binary latent space can be explored for compact yet expressive image representations.
We present both conditional and unconditional image generation experiments with multiple datasets.
The proposed framework can be seamlessly scaled to $1024 times 1024$ high-resolution image generation without resorting to latent hierarchy or multi-stage refinements.
- Score: 36.70550531181131
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we show that a binary latent space can be explored for compact
yet expressive image representations. We model the bi-directional mappings
between an image and the corresponding latent binary representation by training
an auto-encoder with a Bernoulli encoding distribution. On the one hand, the
binary latent space provides a compact discrete image representation of which
the distribution can be modeled more efficiently than pixels or continuous
latent representations. On the other hand, we now represent each image patch as
a binary vector instead of an index of a learned cookbook as in discrete image
representations with vector quantization. In this way, we obtain binary latent
representations that allow for better image quality and high-resolution image
representations without any multi-stage hierarchy in the latent space. In this
binary latent space, images can now be generated effectively using a binary
latent diffusion model tailored specifically for modeling the prior over the
binary image representations. We present both conditional and unconditional
image generation experiments with multiple datasets, and show that the proposed
method performs comparably to state-of-the-art methods while dramatically
improving the sampling efficiency to as few as 16 steps without using any
test-time acceleration. The proposed framework can also be seamlessly scaled to
$1024 \times 1024$ high-resolution image generation without resorting to latent
hierarchy or multi-stage refinements.
Related papers
- bit2bit: 1-bit quanta video reconstruction via self-supervised photon prediction [57.199618102578576]
We propose bit2bit, a new method for reconstructing high-quality image stacks at original resolution from sparse binary quantatemporal image data.
Inspired by recent work on Poisson denoising, we developed an algorithm that creates a dense image sequence from sparse binary photon data.
We present a novel dataset containing a wide range of real SPAD high-speed videos under various challenging imaging conditions.
arXiv Detail & Related papers (2024-10-30T17:30:35Z) - ImageFolder: Autoregressive Image Generation with Folded Tokens [51.815319504939396]
Increasing token length is a common approach to improve the image reconstruction quality.
There exists a trade-off between reconstruction and generation quality regarding token length.
We propose Image, a semantic tokenizer that provides spatially aligned image tokens that can be folded during autoregressive modeling.
arXiv Detail & Related papers (2024-10-02T17:06:39Z) - MaskBit: Embedding-free Image Generation via Bit Tokens [54.827480008982185]
We present an empirical and systematic examination of VQGANs, leading to a modernized VQGAN.
A novel embedding-free generation network operating directly on bit tokens achieves a new state-of-the-art FID of 1.52 on the ImageNet 256x256 benchmark, with a compact generator model of mere 305M parameters.
arXiv Detail & Related papers (2024-09-24T16:12:12Z) - Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating advanced diffusion models (DMs)
Existing binarization methods result in significant performance degradation.
We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
arXiv Detail & Related papers (2024-06-09T10:30:25Z) - Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder [29.924160271522354]
Super-resolution (SR) and image generation are important tasks in computer vision and are widely adopted in real-world applications.
Most existing methods, however, generate images only at fixed-scale magnification and suffer from over-smoothing and artifacts.
Most relevant work applied Implicit Neural Representation (INR) to the denoising diffusion model to obtain continuous-resolution yet diverse and high-quality SR results.
We propose a novel pipeline that can super-resolve an input image or generate from a random noise a novel image at arbitrary scales.
arXiv Detail & Related papers (2024-03-15T12:45:40Z) - Neuromorphic Synergy for Video Binarization [54.195375576583864]
Bimodal objects serve as a visual form to embed information that can be easily recognized by vision systems.
Neuromorphic cameras offer new capabilities for alleviating motion blur, but it is non-trivial to first de-blur and then binarize the images in a real-time manner.
We propose an event-based binary reconstruction method that leverages the prior knowledge of the bimodal target's properties to perform inference independently in both event space and image space.
We also develop an efficient integration method to propagate this binary image to high frame rate binary video.
arXiv Detail & Related papers (2024-02-20T01:43:51Z) - Spatial Latent Representations in Generative Adversarial Networks for
Image Generation [0.0]
We define a family of spatial latent spaces for StyleGAN2.
We show that our spaces are effective for image manipulation and encode semantic information well.
arXiv Detail & Related papers (2023-03-25T20:01:11Z) - Closed-Loop Transcription via Convolutional Sparse Coding [29.75613581643052]
Autoencoders often use generic deep networks as the encoder or decoder, which are difficult to interpret.
In this work, we make the explicit assumption that the image distribution is generated from a multistage convolution sparse coding (CSC)
Our method enjoys several side benefits, including more structured and interpretable representations, more stable convergence, and scalability to large datasets.
arXiv Detail & Related papers (2023-02-18T14:40:07Z) - Hierarchical Text-Conditional Image Generation with CLIP Latents [20.476720970770128]
We show that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity.
Our decoders conditioned on image representations can also produce variations of an image that preserve both its semantics and style.
arXiv Detail & Related papers (2022-04-13T01:10:33Z) - Generating Images with Sparse Representations [21.27273495926409]
High dimensionality of images presents architecture and sampling-efficiency challenges for likelihood-based generative models.
We present an alternative approach, inspired by common image compression methods like JPEG, and convert images to quantized discrete cosine transform (DCT) blocks.
We propose a Transformer-based autoregressive architecture, which is trained to sequentially predict the conditional distribution of the next element in such sequences.
arXiv Detail & Related papers (2021-03-05T17:56:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.