IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation
- URL: http://arxiv.org/abs/2403.10701v1
- Date: Fri, 15 Mar 2024 21:37:04 GMT
- Title: IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation
- Authors: Yizhi Song, Zhifei Zhang, Zhe Lin, Scott Cohen, Brian Price, Jianming Zhang, Soo Ye Kim, He Zhang, Wei Xiong, Daniel Aliaga,
- Abstract summary: IMPRINT is a novel diffusion-based generative model trained with a two-stage learning framework.
The first stage is targeted for context-agnostic, identity-preserving pretraining of the object encoder.
The second stage leverages this representation to learn seamless harmonization of the object composited to the background.
- Score: 40.34581973675213
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Generative object compositing emerges as a promising new avenue for compositional image editing. However, the requirement of object identity preservation poses a significant challenge, limiting practical usage of most existing methods. In response, this paper introduces IMPRINT, a novel diffusion-based generative model trained with a two-stage learning framework that decouples learning of identity preservation from that of compositing. The first stage is targeted for context-agnostic, identity-preserving pretraining of the object encoder, enabling the encoder to learn an embedding that is both view-invariant and conducive to enhanced detail preservation. The subsequent stage leverages this representation to learn seamless harmonization of the object composited to the background. In addition, IMPRINT incorporates a shape-guidance mechanism offering user-directed control over the compositing process. Extensive experiments demonstrate that IMPRINT significantly outperforms existing methods and various baselines on identity preservation and composition quality.
Related papers
- Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models [78.90023746996302]
Add-it is a training-free approach that extends diffusion models' attention mechanisms to incorporate information from three key sources.
Our weighted extended-attention mechanism maintains structural consistency and fine details while ensuring natural object placement.
Human evaluations show that Add-it is preferred in over 80% of cases.
arXiv Detail & Related papers (2024-11-11T18:50:09Z) - EZIGen: Enhancing zero-shot personalized image generation with precise subject encoding and decoupled guidance [20.430259028981094]
EZIGen aims to produce images that align with both a given text prompt and subject image.
It employs two main components: a carefully crafted subject image encoder based on the pre-trained UNet of the Stable Diffusion model.
It achieves state-of-the-art results on multiple personalized generation benchmarks with a unified model and 100 times less training data.
arXiv Detail & Related papers (2024-09-12T14:44:45Z) - Learning to Compose: Improving Object Centric Learning by Injecting Compositionality [27.364435779446072]
compositional representation is a key aspect of object-centric learning.
Most of the existing approaches rely on auto-encoding objective.
We propose a novel objective that explicitly encourages compositionality of the representations.
arXiv Detail & Related papers (2024-05-01T17:21:36Z) - Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm [31.06269858216316]
We propose Infinite-ID, an ID-semantics decoupling paradigm for identity-preserved personalization.
We introduce an identity-enhanced training, incorporating an additional image cross-attention module to capture sufficient ID information.
We also introduce a feature interaction mechanism that combines a mixed attention module with an AdaIN-mean operation to seamlessly merge the two streams.
arXiv Detail & Related papers (2024-03-18T13:39:53Z) - UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders.
We first develop an adaptive feature mask generator to account for the unique significance of nodes.
We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z) - Language-free Compositional Action Generation via Decoupling Refinement [67.50452446686725]
We introduce a novel framework to generate compositional actions without reliance on language auxiliaries.
Our approach consists of three main components: Action Coupling, Conditional Action Generation, and Decoupling Refinement.
arXiv Detail & Related papers (2023-07-07T12:00:38Z) - Taming Encoder for Zero Fine-tuning Image Customization with
Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users.
The method is based on a general framework that bypasses the lengthy optimization required by previous approaches.
We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z) - ObjectStitch: Generative Object Compositing [43.206123360578665]
We propose a self-supervised framework for object compositing using conditional diffusion models.
Our framework can transform the viewpoint, geometry, color and shadow of the generated object while requiring no manual labeling.
Our method outperforms relevant baselines in both realism and faithfulness of the synthesized result images in a user study on various real-world images.
arXiv Detail & Related papers (2022-12-02T02:15:13Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.