Crafting Parts for Expressive Object Composition
- URL: http://arxiv.org/abs/2406.10197v1
- Date: Fri, 14 Jun 2024 17:31:29 GMT
- Title: Crafting Parts for Expressive Object Composition
- Authors: Harsh Rangwani, Aishwarya Agarwal, Kuldeep Kulkarni, R. Venkatesh Babu, Srikrishna Karanam,
- Abstract summary: PartCraft enables image generation based on fine-grained part-level details specified for objects in the base text prompt.
PartCraft first localizes object parts by denoising the object region from a specific diffusion process.
After obtaining part masks, we run a localized diffusion process in each of the part regions based on fine-grained part descriptions.
- Score: 37.791770942390485
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image generation from large generative models like Stable Diffusion, DALLE-2, etc., have become a common base for various tasks due to their superior quality and extensive knowledge bases. As image composition and generation are creative processes the artists need control over various parts of the images being generated. We find that just adding details about parts in the base text prompt either leads to an entirely different image (e.g., missing/incorrect identity) or the extra part details simply being ignored. To mitigate these issues, we introduce PartCraft, which enables image generation based on fine-grained part-level details specified for objects in the base text prompt. This allows more control for artists and enables novel object compositions by combining distinctive object parts. PartCraft first localizes object parts by denoising the object region from a specific diffusion process. This enables each part token to be localized to the right object region. After obtaining part masks, we run a localized diffusion process in each of the part regions based on fine-grained part descriptions and combine them to produce the final image. All the stages of PartCraft are based on repurposing a pre-trained diffusion model, which enables it to generalize across various domains without training. We demonstrate the effectiveness of part-level control provided by PartCraft qualitatively through visual examples and quantitatively in comparison to the contemporary baselines.
Related papers
- PartCraft: Crafting Creative Objects by Parts [128.30514851911218]
This paper propels creative control in generative visual AI by allowing users to "select"
We for the first time allow users to choose visual concepts by parts for their creative endeavors.
Fine-grained generation that precisely captures selected visual concepts.
arXiv Detail & Related papers (2024-07-05T15:53:04Z) - Compositional Image Decomposition with Diffusion Models [70.07406583580591]
In this paper, we present a method to decompose an image into such compositional components.
Our approach, Decomp Diffusion, is an unsupervised method which infers a set of different components in the image.
We demonstrate how components can capture different factors of the scene, ranging from global scene descriptors like shadows or facial expression to local scene descriptors like constituent objects.
arXiv Detail & Related papers (2024-06-27T16:13:34Z) - ViFu: Multiple 360$^\circ$ Objects Reconstruction with Clean Background via Visible Part Fusion [7.8788463395442045]
We propose a method to segment and recover a static, clean background and multiple 360$circ$ objects from observations of scenes at different timestamps.
Our basic idea is that, by observing the same set of objects in various arrangement, so that parts that are invisible in one scene may become visible in others.
arXiv Detail & Related papers (2024-04-15T02:44:23Z) - PrimeComposer: Faster Progressively Combined Diffusion for Image Composition with Attention Steering [13.785484396436367]
We formulate image composition as a subject-based local editing task, solely focusing on foreground generation.
We propose PrimeComposer, a faster training-free diffuser that composites the images by well-designed attention steering across different noise levels.
Our method exhibits the fastest inference efficiency and extensive experiments demonstrate our superiority both qualitatively and quantitatively.
arXiv Detail & Related papers (2024-03-08T04:58:49Z) - LLM Blueprint: Enabling Text-to-Image Generation with Complex and
Detailed Prompts [60.54912319612113]
Diffusion-based generative models have significantly advanced text-to-image generation but encounter challenges when processing lengthy and intricate text prompts.
We present a novel approach leveraging Large Language Models (LLMs) to extract critical components from text prompts.
Our evaluation on complex prompts featuring multiple objects demonstrates a substantial improvement in recall compared to baseline diffusion models.
arXiv Detail & Related papers (2023-10-16T17:57:37Z) - SIEDOB: Semantic Image Editing by Disentangling Object and Background [5.149242555705579]
We propose a novel paradigm for semantic image editing.
textbfSIEDOB, the core idea of which is to explicitly leverage several heterogeneousworks for objects and backgrounds.
We conduct extensive experiments on Cityscapes and ADE20K-Room datasets and exhibit that our method remarkably outperforms the baselines.
arXiv Detail & Related papers (2023-03-23T06:17:23Z) - DisCoScene: Spatially Disentangled Generative Radiance Fields for
Controllable 3D-aware Scene Synthesis [90.32352050266104]
DisCoScene is a 3Daware generative model for high-quality and controllable scene synthesis.
It disentangles the whole scene into object-centric generative fields by learning on only 2D images with the global-local discrimination.
We demonstrate state-of-the-art performance on many scene datasets, including the challenging outdoor dataset.
arXiv Detail & Related papers (2022-12-22T18:59:59Z) - gCoRF: Generative Compositional Radiance Fields [80.45269080324677]
3D generative models of objects enable photorealistic image synthesis with 3D control.
Existing methods model the scene as a global scene representation, ignoring the compositional aspect of the scene.
We present a compositional generative model, where each semantic part of the object is represented as an independent 3D representation.
arXiv Detail & Related papers (2022-10-31T14:10:44Z) - GIRAFFE: Representing Scenes as Compositional Generative Neural Feature
Fields [45.21191307444531]
Deep generative models allow for photorealistic image synthesis at high resolutions.
But for many applications, this is not enough: content creation also needs to be controllable.
Our key hypothesis is that incorporating a compositional 3D scene representation into the generative model leads to more controllable image synthesis.
arXiv Detail & Related papers (2020-11-24T14:14:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.