Memory Efficient Diffusion Probabilistic Models via Patch-based
Generation
- URL: http://arxiv.org/abs/2304.07087v1
- Date: Fri, 14 Apr 2023 12:20:18 GMT
- Title: Memory Efficient Diffusion Probabilistic Models via Patch-based
Generation
- Authors: Shinei Arakawa, Hideki Tsunashima, Daichi Horita, Keitaro Tanaka,
Shigeo Morishima
- Abstract summary: Diffusion probabilistic models have been successful in generating high-quality and diverse images.
Traditional models, whose input and output are high-resolution images, suffer from excessive memory requirements.
We propose a patch-based approach for diffusion probabilistic models that generates images on a patch-by-patch basis.
- Score: 11.749564892273828
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion probabilistic models have been successful in generating
high-quality and diverse images. However, traditional models, whose input and
output are high-resolution images, suffer from excessive memory requirements,
making them less practical for edge devices. Previous approaches for generative
adversarial networks proposed a patch-based method that uses positional
encoding and global content information. Nevertheless, designing a patch-based
approach for diffusion probabilistic models is non-trivial. In this paper, we
resent a diffusion probabilistic model that generates images on a
patch-by-patch basis. We propose two conditioning methods for a patch-based
generation. First, we propose position-wise conditioning using one-hot
representation to ensure patches are in proper positions. Second, we propose
Global Content Conditioning (GCC) to ensure patches have coherent content when
concatenated together. We evaluate our model qualitatively and quantitatively
on CelebA and LSUN bedroom datasets and demonstrate a moderate trade-off
between maximum memory consumption and generated image quality. Specifically,
when an entire image is divided into 2 x 2 patches, our proposed approach can
reduce the maximum memory consumption by half while maintaining comparable
image quality.
Related papers
- Image Super-Resolution with Guarantees via Conformal Generative Models [0.66567375919026]
We present a "confidence mask" capable of reliably and intuitively communicating where the generated image can be trusted.
Our method is adaptable to any black-box generative model, including those locked behind an API.
We prove strong theoretical guarantees for our method that span fidelity error control, reconstruction quality, and robustness in the face of data leakage.
arXiv Detail & Related papers (2025-02-12T13:14:57Z) - MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling [64.09238330331195]
We propose a novel Multi-Modal Auto-Regressive (MMAR) probabilistic modeling framework.
Unlike discretization line of method, MMAR takes in continuous-valued image tokens to avoid information loss.
We show that MMAR demonstrates much more superior performance than other joint multi-modal models.
arXiv Detail & Related papers (2024-10-14T17:57:18Z) - Learning Image Priors through Patch-based Diffusion Models for Solving Inverse Problems [15.298502168256519]
Diffusion models can learn strong image priors from underlying data distribution and use them to solve inverse problems, but the training process is computationally expensive and requires lots of data.
This paper proposes a method to learn an efficient data prior for the entire image by training diffusion models only on patches of images.
arXiv Detail & Related papers (2024-06-04T16:30:37Z) - HySim: An Efficient Hybrid Similarity Measure for Patch Matching in Image Inpainting [0.0]
Inpainting, for filling missing image regions, is a crucial task in various applications, such as medical imaging and remote sensing.
This paper proposes an improved modeldriven approach relying on patch-based techniques.
Our approach deviates from the standard Sum of Squared Differences (SSD) similarity measure by introducing a Hybrid Similarity (HySim)
arXiv Detail & Related papers (2024-03-21T10:59:44Z) - Image Inpainting via Tractable Steering of Diffusion Models [54.13818673257381]
This paper proposes to exploit the ability of Tractable Probabilistic Models (TPMs) to exactly and efficiently compute the constrained posterior.
Specifically, this paper adopts a class of expressive TPMs termed Probabilistic Circuits (PCs)
We show that our approach can consistently improve the overall quality and semantic coherence of inpainted images with only 10% additional computational overhead.
arXiv Detail & Related papers (2023-11-28T21:14:02Z) - Beyond Learned Metadata-based Raw Image Reconstruction [86.1667769209103]
Raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels.
They are not widely adopted by general users due to their substantial storage requirements.
We propose a novel framework that learns a compact representation in the latent space, serving as metadata.
arXiv Detail & Related papers (2023-06-21T06:59:07Z) - Uncovering the Disentanglement Capability in Text-to-Image Diffusion
Models [60.63556257324894]
A key desired property of image generative models is the ability to disentangle different attributes.
We propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation.
Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms.
arXiv Detail & Related papers (2022-12-16T19:58:52Z) - FewGAN: Generating from the Joint Distribution of a Few Images [95.6635227371479]
We introduce FewGAN, a generative model for generating novel, high-quality and diverse images.
FewGAN is a hierarchical patch-GAN that applies quantization at the first coarse scale, followed by a pyramid of residual fully convolutional GANs at finer scales.
In an extensive set of experiments, it is shown that FewGAN outperforms baselines both quantitatively and qualitatively.
arXiv Detail & Related papers (2022-07-18T07:11:28Z) - Variable-Rate Deep Image Compression through Spatially-Adaptive Feature
Transform [58.60004238261117]
We propose a versatile deep image compression network based on Spatial Feature Transform (SFT arXiv:1804.02815)
Our model covers a wide range of compression rates using a single model, which is controlled by arbitrary pixel-wise quality maps.
The proposed framework allows us to perform task-aware image compressions for various tasks.
arXiv Detail & Related papers (2021-08-21T17:30:06Z) - Generating Images with Sparse Representations [21.27273495926409]
High dimensionality of images presents architecture and sampling-efficiency challenges for likelihood-based generative models.
We present an alternative approach, inspired by common image compression methods like JPEG, and convert images to quantized discrete cosine transform (DCT) blocks.
We propose a Transformer-based autoregressive architecture, which is trained to sequentially predict the conditional distribution of the next element in such sequences.
arXiv Detail & Related papers (2021-03-05T17:56:03Z) - Perceptual Image Restoration with High-Quality Priori and Degradation
Learning [28.93489249639681]
We show that our model performs well in measuring the similarity between restored and degraded images.
Our simultaneous restoration and enhancement framework generalizes well to real-world complicated degradation types.
arXiv Detail & Related papers (2021-03-04T13:19:50Z) - Locally Masked Convolution for Autoregressive Models [107.4635841204146]
LMConv is a simple modification to the standard 2D convolution that allows arbitrary masks to be applied to the weights at each location in the image.
We learn an ensemble of distribution estimators that share parameters but differ in generation order, achieving improved performance on whole-image density estimation.
arXiv Detail & Related papers (2020-06-22T17:59:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.