Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with
Prototypical Embedding
- URL: http://arxiv.org/abs/2401.15708v1
- Date: Sun, 28 Jan 2024 17:11:42 GMT
- Title: Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with
Prototypical Embedding
- Authors: Jianxiang Lu, Cong Xie, Hui Guo
- Abstract summary: Our proposed method aims to address the challenges of generalizability and fidelity in an object-driven way.
A prototypical embedding is based on the object's appearance and its class, before fine-tuning the diffusion model.
Our method outperforms several existing works.
- Score: 7.893308498886083
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As large-scale text-to-image generation models have made remarkable progress
in the field of text-to-image generation, many fine-tuning methods have been
proposed. However, these models often struggle with novel objects, especially
with one-shot scenarios. Our proposed method aims to address the challenges of
generalizability and fidelity in an object-driven way, using only a single
input image and the object-specific regions of interest. To improve
generalizability and mitigate overfitting, in our paradigm, a prototypical
embedding is initialized based on the object's appearance and its class, before
fine-tuning the diffusion model. And during fine-tuning, we propose a
class-characterizing regularization to preserve prior knowledge of object
classes. To further improve fidelity, we introduce object-specific loss, which
can also use to implant multiple objects. Overall, our proposed object-driven
method for implanting new objects can integrate seamlessly with existing
concepts as well as with high fidelity and generalization. Our method
outperforms several existing works. The code will be released.
Related papers
- Generalizable Single-view Object Pose Estimation by Two-side Generating and Matching [19.730504197461144]
We present a novel generalizable object pose estimation method to determine the object pose using only one RGB image.
Our method offers generalization to unseen objects without extensive training, operates with a single reference image of the object, and eliminates the need for 3D object models or multiple views of the object.
arXiv Detail & Related papers (2024-11-24T14:31:50Z) - Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation [10.416673784744281]
We propose a weighted-merge method to merge multiple reference image features into corresponding objects.
Our method achieves superior performance to the state-of-the-arts on the Concept101 dataset and DreamBooth dataset of multi-object personalized image generation.
arXiv Detail & Related papers (2024-09-26T15:04:13Z) - DiffUHaul: A Training-Free Method for Object Dragging in Images [78.93531472479202]
We propose a training-free method, dubbed DiffUHaul, for the object dragging task.
We first apply attention masking in each denoising step to make the generation more disentangled across different objects.
In the early denoising steps, we interpolate the attention features between source and target images to smoothly fuse new layouts with the original appearance.
arXiv Detail & Related papers (2024-06-03T17:59:53Z) - ObjectComposer: Consistent Generation of Multiple Objects Without
Fine-tuning [25.033615513933192]
We introduce ObjectComposer for generating compositions of multiple objects that resemble user-specified images.
Our approach is training-free, leveraging the abilities of preexisting models.
arXiv Detail & Related papers (2023-10-10T19:46:58Z) - Cycle Consistency Driven Object Discovery [75.60399804639403]
We introduce a method that explicitly optimize the constraint that each object in a scene should be associated with a distinct slot.
By integrating these consistency objectives into various existing slot-based object-centric methods, we showcase substantial improvements in object-discovery performance.
Our results suggest that the proposed approach not only improves object discovery, but also provides richer features for downstream tasks.
arXiv Detail & Related papers (2023-06-03T21:49:06Z) - Taming Encoder for Zero Fine-tuning Image Customization with
Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users.
The method is based on a general framework that bypasses the lengthy optimization required by previous approaches.
We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z) - Localizing Object-level Shape Variations with Text-to-Image Diffusion
Models [60.422435066544814]
We present a technique to generate a collection of images that depicts variations in the shape of a specific object.
A particular challenge when generating object variations is accurately localizing the manipulation applied over the object's shape.
To localize the image-space operation, we present two techniques that use the self-attention layers in conjunction with the cross-attention layers.
arXiv Detail & Related papers (2023-03-20T17:45:08Z) - Part-aware Prototype Network for Few-shot Semantic Segmentation [50.581647306020095]
We propose a novel few-shot semantic segmentation framework based on the prototype representation.
Our key idea is to decompose the holistic class representation into a set of part-aware prototypes.
We develop a novel graph neural network model to generate and enhance the proposed part-aware prototypes.
arXiv Detail & Related papers (2020-07-13T11:03:09Z) - Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects.
Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity.
We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.