CustomNet: Zero-shot Object Customization with Variable-Viewpoints in
Text-to-Image Diffusion Models
- URL: http://arxiv.org/abs/2310.19784v2
- Date: Thu, 7 Dec 2023 15:22:07 GMT
- Title: CustomNet: Zero-shot Object Customization with Variable-Viewpoints in
Text-to-Image Diffusion Models
- Authors: Ziyang Yuan, Mingdeng Cao, Xintao Wang, Zhongang Qi, Chun Yuan, Ying
Shan
- Abstract summary: CustomNet is a novel object customization approach that explicitly incorporates 3D novel view synthesis capabilities into the object customization process.
We introduce delicate designs to enable location control and flexible background control through textual descriptions or specific user-defined images.
Our method facilitates zero-shot object customization without test-time optimization, offering simultaneous control over the viewpoints, location, and background.
- Score: 85.69959024572363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Incorporating a customized object into image generation presents an
attractive feature in text-to-image generation. However, existing
optimization-based and encoder-based methods are hindered by drawbacks such as
time-consuming optimization, insufficient identity preservation, and a
prevalent copy-pasting effect. To overcome these limitations, we introduce
CustomNet, a novel object customization approach that explicitly incorporates
3D novel view synthesis capabilities into the object customization process.
This integration facilitates the adjustment of spatial position relationships
and viewpoints, yielding diverse outputs while effectively preserving object
identity. Moreover, we introduce delicate designs to enable location control
and flexible background control through textual descriptions or specific
user-defined images, overcoming the limitations of existing 3D novel view
synthesis methods. We further leverage a dataset construction pipeline that can
better handle real-world objects and complex backgrounds. Equipped with these
designs, our method facilitates zero-shot object customization without
test-time optimization, offering simultaneous control over the viewpoints,
location, and background. As a result, our CustomNet ensures enhanced identity
preservation and generates diverse, harmonious outputs.
Related papers
- Generating Compositional Scenes via Text-to-image RGBA Instance Generation [82.63805151691024]
Text-to-image diffusion generative models can generate high quality images at the cost of tedious prompt engineering.
We propose a novel multi-stage generation paradigm that is designed for fine-grained control, flexibility and interactivity.
Our experiments show that our RGBA diffusion model is capable of generating diverse and high quality instances with precise control over object attributes.
arXiv Detail & Related papers (2024-11-16T23:44:14Z) - DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation [22.599542105037443]
DisEnvisioner is a novel approach for effectively extracting and enriching the subject-essential features while filtering out -irrelevant information.
Specifically, the feature of the subject and other irrelevant components are effectively separated into distinctive visual tokens, enabling a much more accurate customization.
Experiments demonstrate the superiority of our approach over existing methods in instruction response (editability), ID consistency, inference speed, and the overall image quality.
arXiv Detail & Related papers (2024-10-02T22:29:14Z) - Customizing Text-to-Image Diffusion with Camera Viewpoint Control [53.621518249820745]
We introduce a new task -- enabling explicit control of camera viewpoint for model customization.
This allows us to modify object properties amongst various background scenes via text prompts.
We propose to condition the 2D diffusion process on rendered, view-dependent features of the new object.
arXiv Detail & Related papers (2024-04-18T16:59:51Z) - SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing [51.857176097841915]
SwapAnything is a novel framework that can swap any objects in an image with personalized concepts given by the reference.
It has three unique advantages: (1) precise control of arbitrary objects and parts rather than the main subject, (2) more faithful preservation of context pixels, (3) better adaptation of the personalized concept to the image.
arXiv Detail & Related papers (2024-04-08T17:52:29Z) - Taming Encoder for Zero Fine-tuning Image Customization with
Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users.
The method is based on a general framework that bypasses the lengthy optimization required by previous approaches.
We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z) - AttrLostGAN: Attribute Controlled Image Synthesis from Reconfigurable
Layout and Style [5.912209564607099]
We propose a method for attribute controlled image synthesis from layout.
We extend a state-of-the-art approach for layout-to-image generation to condition individual objects on attributes.
Our results show that our method can successfully control the fine-grained details of individual objects when modelling complex scenes with multiple objects.
arXiv Detail & Related papers (2021-03-25T10:09:45Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.