DreamInpainter: Text-Guided Subject-Driven Image Inpainting with
Diffusion Models
- URL: http://arxiv.org/abs/2312.03771v1
- Date: Tue, 5 Dec 2023 22:23:19 GMT
- Title: DreamInpainter: Text-Guided Subject-Driven Image Inpainting with
Diffusion Models
- Authors: Shaoan Xie, Yang Zhao, Zhisheng Xiao, Kelvin C.K. Chan, Yandong Li,
Yanwu Xu, Kun Zhang, Tingbo Hou
- Abstract summary: This study introduces Text-Guided Subject-Driven Image Inpainting.
We compute dense subject features to ensure accurate subject replication.
We employ a discriminative token selection module to eliminate redundant subject details.
- Score: 37.133727797607676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This study introduces Text-Guided Subject-Driven Image Inpainting, a novel
task that combines text and exemplar images for image inpainting. While both
text and exemplar images have been used independently in previous efforts,
their combined utilization remains unexplored. Simultaneously accommodating
both conditions poses a significant challenge due to the inherent balance
required between editability and subject fidelity. To tackle this challenge, we
propose a two-step approach DreamInpainter. First, we compute dense subject
features to ensure accurate subject replication. Then, we employ a
discriminative token selection module to eliminate redundant subject details,
preserving the subject's identity while allowing changes according to other
conditions such as mask shape and text prompts. Additionally, we introduce a
decoupling regularization technique to enhance text control in the presence of
exemplar images. Our extensive experiments demonstrate the superior performance
of our method in terms of visual quality, identity preservation, and text
control, showcasing its effectiveness in the context of text-guided
subject-driven image inpainting.
Related papers
- Text Guided Image Editing with Automatic Concept Locating and Forgetting [27.70615803908037]
We propose a novel method called Locate and Forget (LaF) to locate potential target concepts in the image for modification.
Compared to the baselines, our method demonstrates its superiority in text-guided image editing tasks both qualitatively and quantitatively.
arXiv Detail & Related papers (2024-05-30T05:36:32Z) - Locate, Assign, Refine: Taming Customized Image Inpainting with Text-Subject Guidance [17.251982243534144]
LAR-Gen is a novel approach for image inpainting that enables seamless inpainting of masked scene images.
Our approach adopts a coarse-to-fine manner to ensure subject identity preservation and local semantic coherence.
Experiments and varied application scenarios demonstrate the superiority of LAR-Gen in terms of both identity preservation and text semantic consistency.
arXiv Detail & Related papers (2024-03-28T16:07:55Z) - You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval [120.49126407479717]
We introduce a novel compositionality framework, effectively combining sketches and text using pre-trained CLIP models.
Our system extends to novel applications in composed image retrieval, domain transfer, and fine-grained generation.
arXiv Detail & Related papers (2024-03-12T00:27:18Z) - Contrastive Prompts Improve Disentanglement in Text-to-Image Diffusion
Models [68.47333676663312]
We show a simple modification of classifier-free guidance can help disentangle image factors in text-to-image models.
The key idea of our method, Contrastive Guidance, is to characterize an intended factor with two prompts that differ in minimal tokens.
We illustrate whose benefits in three scenarios: (1) to guide domain-specific diffusion models trained on an object class, (2) to gain continuous, rig-like controls for text-to-image generation, and (3) to improve the performance of zero-shot image editors.
arXiv Detail & Related papers (2024-02-21T03:01:17Z) - Training-Free Consistent Text-to-Image Generation [80.4814768762066]
Text-to-image models can portray the same subject across diverse prompts.
Existing approaches fine-tune the model to teach it new words that describe specific user-provided subjects.
We present ConsiStory, a training-free approach that enables consistent subject generation by sharing the internal activations of the pretrained model.
arXiv Detail & Related papers (2024-02-05T18:42:34Z) - Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing
with Pre-Trained Diffusion Model [22.975965453227477]
We introduce a new framework called textitPaste, Inpaint and Harmonize via Denoising (PhD)
In our experiments, we apply PhD to both subject-driven image editing tasks and explore text-driven scene generation given a reference subject.
arXiv Detail & Related papers (2023-06-13T07:43:10Z) - Zero-shot Image-to-Image Translation [57.46189236379433]
We propose pix2pix-zero, an image-to-image translation method that can preserve the original image without manual prompting.
We propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process.
Our method does not need additional training for these edits and can directly use the existing text-to-image diffusion model.
arXiv Detail & Related papers (2023-02-06T18:59:51Z) - Person Text-Image Matching via Text-Featur Interpretability Embedding
and External Attack Node Implantation [22.070781214170164]
Person text-image matching aims to retrieve images of specific pedestrians using text descriptions.
The lack of interpretability of text features makes it challenging to effectively align them with their corresponding image features.
We propose a person text-image matching method by embedding text-feature interpretability and an external attack node.
arXiv Detail & Related papers (2022-11-16T04:15:37Z) - DiffEdit: Diffusion-based semantic image editing with mask guidance [64.555930158319]
DiffEdit is a method to take advantage of text-conditioned diffusion models for the task of semantic image editing.
Our main contribution is able to automatically generate a mask highlighting regions of the input image that need to be edited.
arXiv Detail & Related papers (2022-10-20T17:16:37Z) - Text-Guided Neural Image Inpainting [20.551488941041256]
Inpainting task requires filling the corrupted image with contents coherent with the context.
The goal of this paper is to fill the semantic information in corrupted images according to the provided descriptive text.
We propose a novel inpainting model named Text-Guided Dual Attention Inpainting Network (TDANet)
arXiv Detail & Related papers (2020-04-07T09:04:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.