ObjectStitch: Generative Object Compositing
- URL: http://arxiv.org/abs/2212.00932v2
- Date: Mon, 5 Dec 2022 05:11:31 GMT
- Title: ObjectStitch: Generative Object Compositing
- Authors: Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian Price, Jianming
Zhang, Soo Ye Kim, Daniel Aliaga
- Abstract summary: We propose a self-supervised framework for object compositing using conditional diffusion models.
Our framework can transform the viewpoint, geometry, color and shadow of the generated object while requiring no manual labeling.
Our method outperforms relevant baselines in both realism and faithfulness of the synthesized result images in a user study on various real-world images.
- Score: 43.206123360578665
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Object compositing based on 2D images is a challenging problem since it
typically involves multiple processing stages such as color harmonization,
geometry correction and shadow generation to generate realistic results.
Furthermore, annotating training data pairs for compositing requires
substantial manual effort from professionals, and is hardly scalable. Thus,
with the recent advances in generative models, in this work, we propose a
self-supervised framework for object compositing by leveraging the power of
conditional diffusion models. Our framework can hollistically address the
object compositing task in a unified model, transforming the viewpoint,
geometry, color and shadow of the generated object while requiring no manual
labeling. To preserve the input object's characteristics, we introduce a
content adaptor that helps to maintain categorical semantics and object
appearance. A data augmentation method is further adopted to improve the
fidelity of the generator. Our method outperforms relevant baselines in both
realism and faithfulness of the synthesized result images in a user study on
various real-world images.
Related papers
- Generalizable Single-view Object Pose Estimation by Two-side Generating and Matching [19.730504197461144]
We present a novel generalizable object pose estimation method to determine the object pose using only one RGB image.
Our method offers generalization to unseen objects without extensive training, operates with a single reference image of the object, and eliminates the need for 3D object models or multiple views of the object.
arXiv Detail & Related papers (2024-11-24T14:31:50Z) - SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects [20.978091381109294]
We propose a method to generate articulated objects from a single image.
Our method generates an articulated object that is visually consistent with the input image.
Our experiments show that our method outperforms the state-of-the-art in articulated object creation.
arXiv Detail & Related papers (2024-10-21T20:41:32Z) - Thinking Outside the BBox: Unconstrained Generative Object Compositing [36.86960274923344]
We present a novel problem of unconstrained generative object compositing.
Our first-of-its-kind model is able to generate object effects such as shadows and reflections that go beyond the mask.
Our model outperforms existing object placement and compositing models in various quality metrics and user studies.
arXiv Detail & Related papers (2024-09-06T18:42:30Z) - ObjBlur: A Curriculum Learning Approach With Progressive Object-Level Blurring for Improved Layout-to-Image Generation [7.645341879105626]
We present Blur, a novel curriculum learning approach to improve layout-to-image generation models.
Our method is based on progressive object-level blurring, which effectively stabilizes training and enhances the quality of generated images.
arXiv Detail & Related papers (2024-04-11T08:50:12Z) - Taming Encoder for Zero Fine-tuning Image Customization with
Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users.
The method is based on a general framework that bypasses the lengthy optimization required by previous approaches.
We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z) - Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators [97.12135238534628]
We propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects.
Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts.
Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks.
arXiv Detail & Related papers (2022-12-13T01:36:56Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - Dual Pyramid Generative Adversarial Networks for Semantic Image
Synthesis [94.76988562653845]
The goal of semantic image synthesis is to generate photo-realistic images from semantic label maps.
Current state-of-the-art approaches, however, still struggle to generate realistic objects in images at various scales.
We propose a Dual Pyramid Generative Adversarial Network (DP-GAN) that learns the conditioning of spatially-adaptive normalization blocks at all scales jointly.
arXiv Detail & Related papers (2022-10-08T18:45:44Z) - Combining Semantic Guidance and Deep Reinforcement Learning For
Generating Human Level Paintings [22.889059874754242]
Generation of stroke-based non-photorealistic imagery is an important problem in the computer vision community.
Previous methods have been limited to datasets with little variation in position, scale and saliency of the foreground object.
We propose a Semantic Guidance pipeline with 1) a bi-level painting procedure for learning the distinction between foreground and background brush strokes at training time.
arXiv Detail & Related papers (2020-11-25T09:00:04Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.