Binary Latent Diffusion
- URL: http://arxiv.org/abs/2304.04820v1
- Date: Mon, 10 Apr 2023 19:03:28 GMT
- Title: Binary Latent Diffusion
- Authors: Ze Wang, Jiang Wang, Zicheng Liu, and Qiang Qiu
- Abstract summary: We show that a binary latent space can be explored for compact yet expressive image representations.
We present both conditional and unconditional image generation experiments with multiple datasets.
The proposed framework can be seamlessly scaled to $1024 times 1024$ high-resolution image generation without resorting to latent hierarchy or multi-stage refinements.
- Score: 36.70550531181131
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we show that a binary latent space can be explored for compact
yet expressive image representations. We model the bi-directional mappings
between an image and the corresponding latent binary representation by training
an auto-encoder with a Bernoulli encoding distribution. On the one hand, the
binary latent space provides a compact discrete image representation of which
the distribution can be modeled more efficiently than pixels or continuous
latent representations. On the other hand, we now represent each image patch as
a binary vector instead of an index of a learned cookbook as in discrete image
representations with vector quantization. In this way, we obtain binary latent
representations that allow for better image quality and high-resolution image
representations without any multi-stage hierarchy in the latent space. In this
binary latent space, images can now be generated effectively using a binary
latent diffusion model tailored specifically for modeling the prior over the
binary image representations. We present both conditional and unconditional
image generation experiments with multiple datasets, and show that the proposed
method performs comparably to state-of-the-art methods while dramatically
improving the sampling efficiency to as few as 16 steps without using any
test-time acceleration. The proposed framework can also be seamlessly scaled to
$1024 \times 1024$ high-resolution image generation without resorting to latent
hierarchy or multi-stage refinements.
Related papers
- Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
For the model structure, we design a UNet architecture optimized for binarization.
We propose the consistent-pixel-downsample (CP-Down) and consistent-pixel-upsample (CP-Up) to maintain dimension consistent.
Comprehensive experiments demonstrate that our BI-DiffSR outperforms existing binarization methods.
arXiv Detail & Related papers (2024-06-09T10:30:25Z) - Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder [29.924160271522354]
Super-resolution (SR) and image generation are important tasks in computer vision and are widely adopted in real-world applications.
Most existing methods, however, generate images only at fixed-scale magnification and suffer from over-smoothing and artifacts.
Most relevant work applied Implicit Neural Representation (INR) to the denoising diffusion model to obtain continuous-resolution yet diverse and high-quality SR results.
We propose a novel pipeline that can super-resolve an input image or generate from a random noise a novel image at arbitrary scales.
arXiv Detail & Related papers (2024-03-15T12:45:40Z) - Neuromorphic Synergy for Video Binarization [54.195375576583864]
Bimodal objects serve as a visual form to embed information that can be easily recognized by vision systems.
Neuromorphic cameras offer new capabilities for alleviating motion blur, but it is non-trivial to first de-blur and then binarize the images in a real-time manner.
We propose an event-based binary reconstruction method that leverages the prior knowledge of the bimodal target's properties to perform inference independently in both event space and image space.
We also develop an efficient integration method to propagate this binary image to high frame rate binary video.
arXiv Detail & Related papers (2024-02-20T01:43:51Z) - Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence [88.00004819064672]
Diffusion Hyperfeatures is a framework for consolidating multi-scale and multi-timestep feature maps into per-pixel feature descriptors.
Our method achieves superior performance on the SPair-71k real image benchmark.
arXiv Detail & Related papers (2023-05-23T17:58:05Z) - Spatial Latent Representations in Generative Adversarial Networks for
Image Generation [0.0]
We define a family of spatial latent spaces for StyleGAN2.
We show that our spaces are effective for image manipulation and encode semantic information well.
arXiv Detail & Related papers (2023-03-25T20:01:11Z) - Closed-Loop Transcription via Convolutional Sparse Coding [29.75613581643052]
Autoencoders often use generic deep networks as the encoder or decoder, which are difficult to interpret.
In this work, we make the explicit assumption that the image distribution is generated from a multistage convolution sparse coding (CSC)
Our method enjoys several side benefits, including more structured and interpretable representations, more stable convergence, and scalability to large datasets.
arXiv Detail & Related papers (2023-02-18T14:40:07Z) - Hierarchical Text-Conditional Image Generation with CLIP Latents [20.476720970770128]
We show that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity.
Our decoders conditioned on image representations can also produce variations of an image that preserve both its semantics and style.
arXiv Detail & Related papers (2022-04-13T01:10:33Z) - High-Quality Pluralistic Image Completion via Code Shared VQGAN [51.7805154545948]
We present a novel framework for pluralistic image completion that can achieve both high quality and diversity at much faster inference speed.
Our framework is able to learn semantically-rich discrete codes efficiently and robustly, resulting in much better image reconstruction quality.
arXiv Detail & Related papers (2022-04-05T01:47:35Z) - Generating Images with Sparse Representations [21.27273495926409]
High dimensionality of images presents architecture and sampling-efficiency challenges for likelihood-based generative models.
We present an alternative approach, inspired by common image compression methods like JPEG, and convert images to quantized discrete cosine transform (DCT) blocks.
We propose a Transformer-based autoregressive architecture, which is trained to sequentially predict the conditional distribution of the next element in such sequences.
arXiv Detail & Related papers (2021-03-05T17:56:03Z) - Joint Estimation of Image Representations and their Lie Invariants [57.3768308075675]
Images encode both the state of the world and its content.
The automatic extraction of this information is challenging because of the high-dimensionality and entangled encoding inherent to the image representation.
This article introduces two theoretical approaches aimed at the resolution of these challenges.
arXiv Detail & Related papers (2020-12-05T00:07:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.