Compositional Inversion for Stable Diffusion Models
- URL: http://arxiv.org/abs/2312.08048v3
- Date: Thu, 11 Jan 2024 07:21:16 GMT
- Title: Compositional Inversion for Stable Diffusion Models
- Authors: Xulu Zhang, Xiao-Yong Wei, Jinlin Wu, Tianyi Zhang, Zhaoxiang Zhang,
Zhen Lei, Qing Li
- Abstract summary: Inversion methods generate personalized images by incorporating concepts of interest provided by user images.
Existing methods often suffer from overfitting issues, where the dominant presence of inverted concepts leads to the absence of other desired concepts.
We propose a method that guides the inversion process towards the core distribution for compositional embeddings.
- Score: 64.79261401944994
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inversion methods, such as Textual Inversion, generate personalized images by
incorporating concepts of interest provided by user images. However, existing
methods often suffer from overfitting issues, where the dominant presence of
inverted concepts leads to the absence of other desired concepts. It stems from
the fact that during inversion, the irrelevant semantics in the user images are
also encoded, forcing the inverted concepts to occupy locations far from the
core distribution in the embedding space. To address this issue, we propose a
method that guides the inversion process towards the core distribution for
compositional embeddings. Additionally, we introduce a spatial regularization
approach to balance the attention on the concepts being composed. Our method is
designed as a post-training approach and can be seamlessly integrated with
other inversion methods. Experimental results demonstrate the effectiveness of
our proposed approach in mitigating the overfitting problem and generating more
diverse and balanced compositions of concepts in the synthesized images. The
source code is available at
https://github.com/zhangxulu1996/Compositional-Inversion.
Related papers
- Training-free Composite Scene Generation for Layout-to-Image Synthesis [29.186425845897947]
This paper introduces a novel training-free approach designed to overcome adversarial semantic intersections during the diffusion conditioning phase.
We propose two innovative constraints: 1) an inter-token constraint that resolves token conflicts to ensure accurate concept synthesis; and 2) a self-attention constraint that improves pixel-to-pixel relationships.
Our evaluations confirm the effectiveness of leveraging layout information for guiding the diffusion process, generating content-rich images with enhanced fidelity and complexity.
arXiv Detail & Related papers (2024-07-18T15:48:07Z) - Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion [51.931083971448885]
We propose a framework named Human Feedback Inversion (HFI), where human feedback on model-generated images is condensed into textual tokens guiding the mitigation or removal of problematic images.
Our experimental results demonstrate our framework significantly reduces objectionable content generation while preserving image quality, contributing to the ethical deployment of AI in the public sphere.
arXiv Detail & Related papers (2024-07-17T05:21:41Z) - FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior [50.0535198082903]
We offer a novel approach to image composition, which integrates multiple input images into a single, coherent image.
We showcase the potential of utilizing the powerful generative prior inherent in large-scale pre-trained diffusion models to accomplish generic image composition.
arXiv Detail & Related papers (2024-07-06T03:35:43Z) - Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.
We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework.
We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - Diffusion Posterior Illumination for Ambiguity-aware Inverse Rendering [63.24476194987721]
Inverse rendering, the process of inferring scene properties from images, is a challenging inverse problem.
Most existing solutions incorporate priors into the inverse-rendering pipeline to encourage plausible solutions.
We propose a novel scheme that integrates a denoising probabilistic diffusion model pre-trained on natural illumination maps into an optimization framework.
arXiv Detail & Related papers (2023-09-30T12:39:28Z) - Break-A-Scene: Extracting Multiple Concepts from a Single Image [80.47666266017207]
We introduce the task of textual scene decomposition.
We propose augmenting the input image with masks that indicate the presence of target concepts.
We then present a novel two-phase customization process.
arXiv Detail & Related papers (2023-05-25T17:59:04Z) - JoIN: Joint GANs Inversion for Intrinsic Image Decomposition [16.02463667910604]
We propose to solve ill-posed inverse imaging problems using a bank of Generative Adversarial Networks (GAN)
Our method builds on the demonstrated success of GANs to capture complex image distributions.
arXiv Detail & Related papers (2023-05-18T22:09:32Z) - Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style
Transfer [38.957512116073616]
We propose a zero-shot contrastive loss for diffusion models that doesn't require additional fine-tuning or auxiliary networks.
Our method can generate images with the same semantic content as the source image in a zero-shot manner.
arXiv Detail & Related papers (2023-03-15T13:47:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.