DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
- URL: http://arxiv.org/abs/2410.02067v2
- Date: Mon, 28 Oct 2024 15:35:52 GMT
- Title: DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
- Authors: Jing He, Haodong Li, Yongzhe Hu, Guibao Shen, Yingjie Cai, Weichao Qiu, Ying-Cong Chen,
- Abstract summary: DisEnvisioner is a novel approach for effectively extracting and enriching the subject-essential features while filtering out -irrelevant information.
Specifically, the feature of the subject and other irrelevant components are effectively separated into distinctive visual tokens, enabling a much more accurate customization.
Experiments demonstrate the superiority of our approach over existing methods in instruction response (editability), ID consistency, inference speed, and the overall image quality.
- Score: 22.599542105037443
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the realm of image generation, creating customized images from visual prompt with additional textual instruction emerges as a promising endeavor. However, existing methods, both tuning-based and tuning-free, struggle with interpreting the subject-essential attributes from the visual prompt. This leads to subject-irrelevant attributes infiltrating the generation process, ultimately compromising the personalization quality in both editability and ID preservation. In this paper, we present DisEnvisioner, a novel approach for effectively extracting and enriching the subject-essential features while filtering out -irrelevant information, enabling exceptional customization performance, in a tuning-free manner and using only a single image. Specifically, the feature of the subject and other irrelevant components are effectively separated into distinctive visual tokens, enabling a much more accurate customization. Aiming to further improving the ID consistency, we enrich the disentangled features, sculpting them into more granular representations. Experiments demonstrate the superiority of our approach over existing methods in instruction response (editability), ID consistency, inference speed, and the overall image quality, highlighting the effectiveness and efficiency of DisEnvisioner. Project page: https://disenvisioner.github.io/.
Related papers
- DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - PromptFix: You Prompt and We Fix the Photo [84.69812824355269]
Diffusion models equipped with language models demonstrate excellent controllability in image generation tasks.
The lack of diverse instruction-following data hampers the development of models.
We propose PromptFix, a framework that enables diffusion models to follow human instructions.
arXiv Detail & Related papers (2024-05-27T03:13:28Z) - IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models [31.762112403595612]
IDAdapter is a tuning-free approach that enhances the diversity and identity preservation in personalized image generation from a single face image.
During the training phase, we incorporate mixed features from multiple reference images of a specific identity to enrich identity-related content details.
arXiv Detail & Related papers (2024-03-20T12:13:04Z) - Tuning-Free Image Customization with Image and Text Guidance [65.9504243633169]
We introduce a tuning-free framework for simultaneous text-image-guided image customization.
Our approach preserves the semantic features of the reference image subject while allowing modification of detailed attributes based on text descriptions.
Our approach outperforms previous methods in both human and quantitative evaluations.
arXiv Detail & Related papers (2024-03-19T11:48:35Z) - OSTAF: A One-Shot Tuning Method for Improved Attribute-Focused T2I Personalization [9.552325786494334]
We introduce a novel parameter-efficient one-shot fine-tuning method for personalized text-to-image (T2I) personalization.
A novel hypernetwork-powered attribute-focused fine-tuning mechanism is employed to achieve the precise learning of various attribute features.
Our method shows significant superiority in attribute identification and application, as well as achieves a good balance between efficiency and output quality.
arXiv Detail & Related papers (2024-03-17T01:42:48Z) - AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for
Text-Based Continuity-Sensitive Image Editing [24.9487669818162]
We propose atemporal guided adaptive editing algorithm AdapEdit, which realizes adaptive image editing.
Our approach has a significant advantage in preserving model priors and does not require model training, fine-tuning extra data, or optimization.
We present our results over a wide variety of raw images and editing instructions, demonstrating competitive performance and showing it significantly outperforms the previous approaches.
arXiv Detail & Related papers (2023-12-13T09:45:58Z) - PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved
Personalization [92.90392834835751]
PortraitBooth is designed for high efficiency, robust identity preservation, and expression-editable text-to-image generation.
PortraitBooth eliminates computational overhead and mitigates identity distortion.
It incorporates emotion-aware cross-attention control for diverse facial expressions in generated images.
arXiv Detail & Related papers (2023-12-11T13:03:29Z) - CustomNet: Zero-shot Object Customization with Variable-Viewpoints in
Text-to-Image Diffusion Models [85.69959024572363]
CustomNet is a novel object customization approach that explicitly incorporates 3D novel view synthesis capabilities into the object customization process.
We introduce delicate designs to enable location control and flexible background control through textual descriptions or specific user-defined images.
Our method facilitates zero-shot object customization without test-time optimization, offering simultaneous control over the viewpoints, location, and background.
arXiv Detail & Related papers (2023-10-30T17:50:14Z) - PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion
Models [19.519789922033034]
PhotoVerse is an innovative methodology that incorporates a dual-branch conditioning mechanism in both text and image domains.
After a single training phase, our approach enables generating high-quality images within only a few seconds.
arXiv Detail & Related papers (2023-09-11T19:59:43Z) - Look here! A parametric learning based approach to redirect visual
attention [49.609412873346386]
We introduce an automatic method to make an image region more attention-capturing via subtle image edits.
Our model predicts a distinct set of global parametric transformations to be applied to the foreground and background image regions.
Our edits enable inference at interactive rates on any image size, and easily generalize to videos.
arXiv Detail & Related papers (2020-08-12T16:08:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.