Compositional Image Decomposition with Diffusion Models
- URL: http://arxiv.org/abs/2406.19298v1
- Date: Thu, 27 Jun 2024 16:13:34 GMT
- Title: Compositional Image Decomposition with Diffusion Models
- Authors: Jocelin Su, Nan Liu, Yanbo Wang, Joshua B. Tenenbaum, Yilun Du,
- Abstract summary: In this paper, we present a method to decompose an image into such compositional components.
Our approach, Decomp Diffusion, is an unsupervised method which infers a set of different components in the image.
We demonstrate how components can capture different factors of the scene, ranging from global scene descriptors like shadows or facial expression to local scene descriptors like constituent objects.
- Score: 70.07406583580591
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given an image of a natural scene, we are able to quickly decompose it into a set of components such as objects, lighting, shadows, and foreground. We can then envision a scene where we combine certain components with those from other images, for instance a set of objects from our bedroom and animals from a zoo under the lighting conditions of a forest, even if we have never encountered such a scene before. In this paper, we present a method to decompose an image into such compositional components. Our approach, Decomp Diffusion, is an unsupervised method which, when given a single image, infers a set of different components in the image, each represented by a diffusion model. We demonstrate how components can capture different factors of the scene, ranging from global scene descriptors like shadows or facial expression to local scene descriptors like constituent objects. We further illustrate how inferred factors can be flexibly composed, even with factors inferred from other models, to generate a variety of scenes sharply different than those seen in training time. Website and code at https://energy-based-model.github.io/decomp-diffusion.
Related papers
- Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering [56.68286440268329]
correct insertion of virtual objects in images of real-world scenes requires a deep understanding of the scene's lighting, geometry and materials.
We propose using a personalized large diffusion model as guidance to a physically based inverse rendering process.
Our method recovers scene lighting and tone-mapping parameters, allowing the photorealistic composition of arbitrary virtual objects in single frames or videos of indoor or outdoor scenes.
arXiv Detail & Related papers (2024-08-19T05:15:45Z) - Crafting Parts for Expressive Object Composition [37.791770942390485]
PartCraft enables image generation based on fine-grained part-level details specified for objects in the base text prompt.
PartCraft first localizes object parts by denoising the object region from a specific diffusion process.
After obtaining part masks, we run a localized diffusion process in each of the part regions based on fine-grained part descriptions.
arXiv Detail & Related papers (2024-06-14T17:31:29Z) - Neural Gaffer: Relighting Any Object via Diffusion [43.87941408722868]
We propose a novel end-to-end 2D relighting diffusion model, called Neural Gaffer.
Our model takes a single image of any object and can synthesize an accurate, high-quality relit image under any novel lighting condition.
We evaluate our model on both synthetic and in-the-wild Internet imagery and demonstrate its advantages in terms of generalization and accuracy.
arXiv Detail & Related papers (2024-06-11T17:50:15Z) - Factorized Diffusion: Perceptual Illusions by Noise Decomposition [15.977340635967018]
We present a zero-shot method to control each individual component through diffusion model sampling.
For certain decompositions, our method recovers prior approaches to compositional generation and spatial control.
We show that we can extend our approach to generate hybrid images from real images.
arXiv Detail & Related papers (2024-04-17T17:59:59Z) - Move Anything with Layered Scene Diffusion [77.45870343845492]
We propose SceneDiffusion to optimize a layered scene representation during the diffusion sampling process.
Our key insight is that spatial disentanglement can be obtained by jointly denoising scene renderings at different spatial layouts.
Our generated scenes support a wide range of spatial editing operations, including moving, resizing, cloning, and layer-wise appearance editing operations.
arXiv Detail & Related papers (2024-04-10T17:28:16Z) - Disentangled 3D Scene Generation with Layout Learning [109.03233745767062]
We introduce a method to generate 3D scenes that are disentangled into their component objects.
Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene.
We show that despite its simplicity, our approach successfully generates 3D scenes into individual objects.
arXiv Detail & Related papers (2024-02-26T18:54:15Z) - gCoRF: Generative Compositional Radiance Fields [80.45269080324677]
3D generative models of objects enable photorealistic image synthesis with 3D control.
Existing methods model the scene as a global scene representation, ignoring the compositional aspect of the scene.
We present a compositional generative model, where each semantic part of the object is represented as an independent 3D representation.
arXiv Detail & Related papers (2022-10-31T14:10:44Z) - BlobGAN: Spatially Disentangled Scene Representations [67.60387150586375]
We propose an unsupervised, mid-level representation for a generative model of scenes.
The representation is mid-level in that it is neither per-pixel nor per-image; rather, scenes are modeled as a collection of spatial, depth-ordered "blobs" of features.
arXiv Detail & Related papers (2022-05-05T17:59:55Z) - GIRAFFE: Representing Scenes as Compositional Generative Neural Feature
Fields [45.21191307444531]
Deep generative models allow for photorealistic image synthesis at high resolutions.
But for many applications, this is not enough: content creation also needs to be controllable.
Our key hypothesis is that incorporating a compositional 3D scene representation into the generative model leads to more controllable image synthesis.
arXiv Detail & Related papers (2020-11-24T14:14:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.