Consistent Image Layout Editing with Diffusion Models
- URL: http://arxiv.org/abs/2503.06419v1
- Date: Sun, 09 Mar 2025 03:31:28 GMT
- Title: Consistent Image Layout Editing with Diffusion Models
- Authors: Tao Xia, Yudi Zhang, Ting Liu Lei Zhang,
- Abstract summary: This paper proposes a novel image layout editing method that can not only re-arrange a real image to a specified layout, but also can ensure the visual appearance of the objects consistent with their appearance before editing.
- Score: 0.8431867616409958
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the great success of large-scale text-to-image diffusion models in image generation and image editing, existing methods still struggle to edit the layout of real images. Although a few works have been proposed to tackle this problem, they either fail to adjust the layout of images, or have difficulty in preserving visual appearance of objects after the layout adjustment. To bridge this gap, this paper proposes a novel image layout editing method that can not only re-arrange a real image to a specified layout, but also can ensure the visual appearance of the objects consistent with their appearance before editing. Concretely, the proposed method consists of two key components. Firstly, a multi-concept learning scheme is used to learn the concepts of different objects from a single image, which is crucial for keeping visual consistency in the layout editing. Secondly, it leverages the semantic consistency within intermediate features of diffusion models to project the appearance information of objects to the desired regions directly. Besides, a novel initialization noise design is adopted to facilitate the process of re-arranging the layout. Extensive experiments demonstrate that the proposed method outperforms previous works in both layout alignment and visual consistency for the task of image layout editing
Related papers
- Concept Lancet: Image Editing with Compositional Representation Transplant [58.9421919837084]
Concept Lancet is a zero-shot plug-and-play framework for principled representation manipulation in image editing.
We decompose the source input in the latent (text embedding or diffusion score) space as a sparse linear combination of the representations of the collected visual concepts.
We perform a customized concept transplant process to impose the corresponding editing direction.
arXiv Detail & Related papers (2025-04-03T17:59:58Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - Unified Editing of Panorama, 3D Scenes, and Videos Through Disentangled Self-Attention Injection [60.47731445033151]
We propose a novel unified editing framework that combines the strengths of both approaches by utilizing only a basic 2D image text-to-image (T2I) diffusion model.
Experimental results confirm that our method enables editing across diverse modalities including 3D scenes, videos, and panorama images.
arXiv Detail & Related papers (2024-05-27T04:44:36Z) - VASE: Object-Centric Appearance and Shape Manipulation of Real Videos [108.60416277357712]
In this work, we introduce a framework that is object-centric and is designed to control both the object's appearance and, notably, to execute precise and explicit structural modifications on the object.
We build our framework on a pre-trained image-conditioned diffusion model, integrate layers to handle the temporal dimension, and propose training strategies and architectural modifications to enable shape control.
We evaluate our method on the image-driven video editing task showing similar performance to the state-of-the-art, and showcasing novel shape-editing capabilities.
arXiv Detail & Related papers (2024-01-04T18:59:24Z) - Continuous Layout Editing of Single Images with Diffusion Models [24.581184791106562]
We propose the first framework for layout editing of a single image while preserving its visual properties.
Our approach is achieved through two key modules.
Our code will be freely available for public use upon acceptance.
arXiv Detail & Related papers (2023-06-22T17:51:05Z) - StyleDiffusion: Prompt-Embedding Inversion for Text-Based Editing [115.49488548588305]
A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images.<n>They either finetune the model, or invert the image in the latent space of the pretrained model.<n>They suffer from two problems: Unsatisfying results for selected regions and unexpected changes in non-selected regions.
arXiv Detail & Related papers (2023-03-28T00:16:45Z) - Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting.
We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process.
Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z) - Geometry Aligned Variational Transformer for Image-conditioned Layout
Generation [38.747175229902396]
We propose an Image-Conditioned Variational Transformer (ICVT) that autoregressively generates various layouts in an image.
First, self-attention mechanism is adopted to model the contextual relationship within layout elements, while cross-attention mechanism is used to fuse the visual information of conditional images.
We construct a large-scale advertisement poster layout designing dataset with delicate layout and saliency map annotations.
arXiv Detail & Related papers (2022-09-02T07:19:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.