SIEDOB: Semantic Image Editing by Disentangling Object and Background
- URL: http://arxiv.org/abs/2303.13062v1
- Date: Thu, 23 Mar 2023 06:17:23 GMT
- Title: SIEDOB: Semantic Image Editing by Disentangling Object and Background
- Authors: Wuyang Luo, Su Yang, Xinjian Zhang, Weishan Zhang
- Abstract summary: We propose a novel paradigm for semantic image editing.
textbfSIEDOB, the core idea of which is to explicitly leverage several heterogeneousworks for objects and backgrounds.
We conduct extensive experiments on Cityscapes and ADE20K-Room datasets and exhibit that our method remarkably outperforms the baselines.
- Score: 5.149242555705579
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Semantic image editing provides users with a flexible tool to modify a given
image guided by a corresponding segmentation map. In this task, the features of
the foreground objects and the backgrounds are quite different. However, all
previous methods handle backgrounds and objects as a whole using a monolithic
model. Consequently, they remain limited in processing content-rich images and
suffer from generating unrealistic objects and texture-inconsistent
backgrounds. To address this issue, we propose a novel paradigm,
\textbf{S}emantic \textbf{I}mage \textbf{E}diting by \textbf{D}isentangling
\textbf{O}bject and \textbf{B}ackground (\textbf{SIEDOB}), the core idea of
which is to explicitly leverages several heterogeneous subnetworks for objects
and backgrounds. First, SIEDOB disassembles the edited input into background
regions and instance-level objects. Then, we feed them into the dedicated
generators. Finally, all synthesized parts are embedded in their original
locations and utilize a fusion network to obtain a harmonized result. Moreover,
to produce high-quality edited images, we propose some innovative designs,
including Semantic-Aware Self-Propagation Module, Boundary-Anchored Patch
Discriminator, and Style-Diversity Object Generator, and integrate them into
SIEDOB. We conduct extensive experiments on Cityscapes and ADE20K-Room datasets
and exhibit that our method remarkably outperforms the baselines, especially in
synthesizing realistic and diverse objects and texture-consistent backgrounds.
Related papers
- GroundingBooth: Grounding Text-to-Image Customization [17.185571339157075]
We introduce GroundingBooth, a framework that achieves zero-shot instance-level spatial grounding on both foreground subjects and background objects.
Our proposed text-image grounding module and masked cross-attention layer allow us to generate personalized images with both accurate layout alignment and identity preservation.
arXiv Detail & Related papers (2024-09-13T03:40:58Z) - Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model [81.96954332787655]
We introduce Diffree, a Text-to-Image (T2I) model that facilitates text-guided object addition with only text control.
In experiments, Diffree adds new objects with a high success rate while maintaining background consistency, spatial, and object relevance and quality.
arXiv Detail & Related papers (2024-07-24T03:58:58Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - DreamCom: Finetuning Text-guided Inpainting Model for Image Composition [24.411003826961686]
We propose DreamCom by treating image composition as text-guided image inpainting customized for certain object.
Specifically, we finetune pretrained text-guided image inpainting model based on a few reference images containing the same object.
In practice, the inserted object may be adversely affected by the background, so we propose masked attention mechanisms to avoid negative background interference.
arXiv Detail & Related papers (2023-09-27T09:23:50Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - Panoptic-based Object Style-Align for Image-to-Image Translation [2.226472061870956]
We propose panoptic-based object style-align generative adversarial networks (POSA-GANs) for image-to-image translation.
The proposed method was systematically compared with different competing methods and obtained significant improvement on both image quality and object recognition performance for translated images.
arXiv Detail & Related papers (2021-12-03T14:28:11Z) - Controllable Person Image Synthesis with Spatially-Adaptive Warped
Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes.
We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters.
We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z) - BachGAN: High-Resolution Image Synthesis from Salient Object Layout [78.51640906030244]
We propose a new task towards more practical application for image generation - high-quality image synthesis from salient object layout.
Two main challenges spring from this new task: (i) how to generate fine-grained details and realistic textures without segmentation map input; and (ii) how to create a background and weave it seamlessly into standalone objects.
By generating the hallucinated background representation dynamically, our model can synthesize high-resolution images with both photo-realistic foreground and integral background.
arXiv Detail & Related papers (2020-03-26T00:54:44Z) - Generating Object Stamps [47.20601520671103]
We present an algorithm to generate diverse foreground objects and composite them into background images using a GAN architecture.
Our results on the challenging COCO dataset show improved overall quality and diversity compared to state-of-the-art object insertion approaches.
arXiv Detail & Related papers (2020-01-01T14:36:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.