$\infty$-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions
- URL: http://arxiv.org/abs/2407.14709v1
- Date: Sat, 20 Jul 2024 00:04:49 GMT
- Title: $\infty$-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions
- Authors: Minh-Quan Le, Alexandros Graikos, Srikar Yellapragada, Rajarsi Gupta, Joel Saltz, Dimitris Samaras,
- Abstract summary: We introduce a novel conditional diffusion model in infinite dimensions, $infty$-Brush for controllable large image synthesis.
To our best knowledge, $infty$-Brush is the first conditional diffusion model in function space, that can controllably synthesize images at arbitrary resolutions of up to $4096times4096$ pixels.
- Score: 58.42011190989414
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Synthesizing high-resolution images from intricate, domain-specific information remains a significant challenge in generative modeling, particularly for applications in large-image domains such as digital histopathology and remote sensing. Existing methods face critical limitations: conditional diffusion models in pixel or latent space cannot exceed the resolution on which they were trained without losing fidelity, and computational demands increase significantly for larger image sizes. Patch-based methods offer computational efficiency but fail to capture long-range spatial relationships due to their overreliance on local information. In this paper, we introduce a novel conditional diffusion model in infinite dimensions, $\infty$-Brush for controllable large image synthesis. We propose a cross-attention neural operator to enable conditioning in function space. Our model overcomes the constraints of traditional finite-dimensional diffusion models and patch-based methods, offering scalability and superior capability in preserving global image structures while maintaining fine details. To our best knowledge, $\infty$-Brush is the first conditional diffusion model in function space, that can controllably synthesize images at arbitrary resolutions of up to $4096\times4096$ pixels. The code is available at https://github.com/cvlab-stonybrook/infinity-brush.
Related papers
- Image Neural Field Diffusion Models [46.781775067944395]
We propose to learn the distribution of continuous images by training diffusion models on image neural fields.
We show that image neural field diffusion models can be trained using mixed-resolution image datasets, outperform fixed-resolution diffusion models, and can solve inverse problems with conditions applied at different scales efficiently.
arXiv Detail & Related papers (2024-06-11T17:24:02Z) - Scalable Diffusion Models with State Space Backbone [33.92910068664058]
Diffusion State Space Models treat all inputs including time, condition, and noisy image patches as tokens.
We analyze the scalability of DiS, gauged by the forward pass complexity in Gflops.
DiS-H/2 models in latent space achieve performance levels akin to prior diffusion models on class-conditional ImageNet benchmarks.
arXiv Detail & Related papers (2024-02-08T12:08:42Z) - Domain Transfer in Latent Space (DTLS) Wins on Image Super-Resolution --
a Non-Denoising Model [13.326634982790528]
We propose a simple approach which gets away from using Gaussian noise but adopts some basic structures of diffusion models for efficient image super-resolution.
Experimental results show that our method outperforms not only state-of-the-art large scale super resolution models, but also the current diffusion models for image super-resolution.
arXiv Detail & Related papers (2023-11-04T09:57:50Z) - ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with
Diffusion Models [126.35334860896373]
We investigate the capability of generating images from pre-trained diffusion models at much higher resolutions than the training image sizes.
Existing works for higher-resolution generation, such as attention-based and joint-diffusion approaches, cannot well address these issues.
We propose a simple yet effective re-dilation that can dynamically adjust the convolutional perception field during inference.
arXiv Detail & Related papers (2023-10-11T17:52:39Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z) - I$^2$SB: Image-to-Image Schr\"odinger Bridge [87.43524087956457]
Image-to-Image Schr"odinger Bridge (I$2$SB) is a new class of conditional diffusion models.
I$2$SB directly learns the nonlinear diffusion processes between two given distributions.
We show that I$2$SB surpasses standard conditional diffusion models with more interpretable generative processes.
arXiv Detail & Related papers (2023-02-12T08:35:39Z) - SDM: Spatial Diffusion Model for Large Hole Image Inpainting [106.90795513361498]
We present a novel spatial diffusion model (SDM) that uses a few iterations to gradually deliver informative pixels to the entire image.
Also, thanks to the proposed decoupled probabilistic modeling and spatial diffusion scheme, our method achieves high-quality large-hole completion.
arXiv Detail & Related papers (2022-12-06T13:30:18Z) - InfinityGAN: Towards Infinite-Resolution Image Synthesis [92.40782797030977]
We present InfinityGAN, a method to generate arbitrary-resolution images.
We show how it trains and infers patch-by-patch seamlessly with low computational resources.
arXiv Detail & Related papers (2021-04-08T17:59:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.