Collage Diffusion
- URL: http://arxiv.org/abs/2303.00262v2
- Date: Thu, 31 Aug 2023 09:01:35 GMT
- Title: Collage Diffusion
- Authors: Vishnu Sarukkai, Linden Li, Arden Ma, Christopher R\'e, Kayvon
Fatahalian
- Abstract summary: Collage Diffusion harmonizes the input layers to make objects fit together.
We preserve key visual attributes of input layers by learning specialized text representations per layer.
Collage Diffusion generates globally harmonized images that maintain desired object characteristics better than prior approaches.
- Score: 17.660410448312717
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We seek to give users precise control over diffusion-based image generation
by modeling complex scenes as sequences of layers, which define the desired
spatial arrangement and visual attributes of objects in the scene. Collage
Diffusion harmonizes the input layers to make objects fit together -- the key
challenge involves minimizing changes in the positions and key visual
attributes of the input layers while allowing other attributes to change in the
harmonization process. We ensure that objects are generated in the correct
locations by modifying text-image cross-attention with the layers' alpha masks.
We preserve key visual attributes of input layers by learning specialized text
representations per layer and by extending ControlNet to operate on layers.
Layer input allows users to control the extent of image harmonization on a
per-object basis, and users can even iteratively edit individual objects in
generated images while keeping other objects fixed. By leveraging the rich
information present in layer input, Collage Diffusion generates globally
harmonized images that maintain desired object characteristics better than
prior approaches.
Related papers
- DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - Move Anything with Layered Scene Diffusion [77.45870343845492]
We propose SceneDiffusion to optimize a layered scene representation during the diffusion sampling process.
Our key insight is that spatial disentanglement can be obtained by jointly denoising scene renderings at different spatial layouts.
Our generated scenes support a wide range of spatial editing operations, including moving, resizing, cloning, and layer-wise appearance editing operations.
arXiv Detail & Related papers (2024-04-10T17:28:16Z) - DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing [22.855660721387167]
We transform the spatial-aware image editing task into a combination of two sub-tasks: multi-layered latent decomposition and multi-layered latent fusion.
We show that our approach consistently surpasses the latest spatial editing methods, including Self-Guidance and DiffEditor.
arXiv Detail & Related papers (2024-03-21T15:35:42Z) - LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model [70.14953942532621]
Layer-collaborative diffusion model, named LayerDiff, is designed for text-guided, multi-layered, composable image synthesis.
Our model can generate high-quality multi-layered images with performance comparable to conventional whole-image generation methods.
LayerDiff enables a broader range of controllable generative applications, including layer-specific image editing and style transfer.
arXiv Detail & Related papers (2024-03-18T16:28:28Z) - Harnessing the Spatial-Temporal Attention of Diffusion Models for
High-Fidelity Text-to-Image Synthesis [59.10787643285506]
Diffusion-based models have achieved state-of-the-art performance on text-to-image synthesis tasks.
One critical limitation of these models is the low fidelity of generated images with respect to the text description.
We propose a new text-to-image algorithm that adds explicit control over spatial-temporal cross-attention in diffusion models.
arXiv Detail & Related papers (2023-04-07T23:49:34Z) - Localizing Object-level Shape Variations with Text-to-Image Diffusion
Models [60.422435066544814]
We present a technique to generate a collection of images that depicts variations in the shape of a specific object.
A particular challenge when generating object variations is accurately localizing the manipulation applied over the object's shape.
To localize the image-space operation, we present two techniques that use the self-attention layers in conjunction with the cross-attention layers.
arXiv Detail & Related papers (2023-03-20T17:45:08Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z) - Learning Layout and Style Reconfigurable GANs for Controllable Image
Synthesis [12.449076001538552]
This paper focuses on a recent emerged task, layout-to-image, to learn generative models capable of synthesizing photo-realistic images from spatial layout.
Style control at the image level is the same as in vanilla GANs, while style control at the object mask level is realized by a proposed novel feature normalization scheme.
In experiments, the proposed method is tested in the COCO-Stuff dataset and the Visual Genome dataset with state-of-the-art performance obtained.
arXiv Detail & Related papers (2020-03-25T18:16:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.