Training-free Content Injection using h-space in Diffusion Models
- URL: http://arxiv.org/abs/2303.15403v2
- Date: Thu, 4 Jan 2024 09:23:07 GMT
- Title: Training-free Content Injection using h-space in Diffusion Models
- Authors: Jaeseok Jeong, Mingi Kwon, Youngjung Uh
- Abstract summary: In this paper, we introduce a method to inject the content of one image into another image by combining their features in the generative processes.
Unlike custom-diffusion approaches, our method does not require time-consuming optimization or fine-tuning.
- Score: 16.51521884698886
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Diffusion models (DMs) synthesize high-quality images in various domains.
However, controlling their generative process is still hazy because the
intermediate variables in the process are not rigorously studied. Recently, the
bottleneck feature of the U-Net, namely $h$-space, is found to convey the
semantics of the resulting image. It enables StyleCLIP-like latent editing
within DMs. In this paper, we explore further usage of $h$-space beyond
attribute editing, and introduce a method to inject the content of one image
into another image by combining their features in the generative processes.
Briefly, given the original generative process of the other image, 1) we
gradually blend the bottleneck feature of the content with proper
normalization, and 2) we calibrate the skip connections to match the injected
content. Unlike custom-diffusion approaches, our method does not require
time-consuming optimization or fine-tuning. Instead, our method manipulates
intermediate features within a feed-forward generative process. Furthermore,
our method does not require supervision from external networks. The code is
available at https://curryjung.github.io/InjectFusion/
Related papers
- Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding [84.3224556294803]
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences.
We aim to optimize downstream reward functions while preserving the naturalness of these design spaces.
Our algorithm integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future.
arXiv Detail & Related papers (2024-08-15T16:47:59Z) - RecDiffusion: Rectangling for Image Stitching with Diffusion Models [53.824503710254206]
We introduce a novel diffusion-based learning framework, textbfRecDiffusion, for image stitching rectangling.
This framework combines Motion Diffusion Models (MDM) to generate motion fields, effectively transitioning from the stitched image's irregular borders to a geometrically corrected intermediary.
arXiv Detail & Related papers (2024-03-28T06:22:45Z) - IIDM: Image-to-Image Diffusion Model for Semantic Image Synthesis [8.080248399002663]
In this paper, semantic image synthesis is treated as an image denoising task.
The style reference is first contaminated with random noise and then progressively denoised by IIDM.
Three techniques, refinement, color-transfer and model ensembles are proposed to further boost the generation quality.
arXiv Detail & Related papers (2024-03-20T08:21:00Z) - IterInv: Iterative Inversion for Pixel-Level T2I Models [16.230193725587807]
DDIM inversion is a prevalent practice rooted in Latent Diffusion Models (LDM)
Large pretrained T2I models working on the latent space suffer from losing details due to the first compression stage with an autoencoder mechanism.
We develop an iterative inversion (IterInv) technique for this category of T2I models and verify IterInv with the open-source DeepFloyd-IF model.
arXiv Detail & Related papers (2023-10-30T13:47:46Z) - Not All Steps are Created Equal: Selective Diffusion Distillation for
Image Manipulation [23.39614544877529]
Conditional diffusion models have demonstrated impressive performance in image manipulation tasks.
Adding too much noise affects the fidelity of the image while adding too little affects its editability.
We propose a novel framework, Diffusion Selective Distillation (SDD), that ensures both the fidelity and editability of images.
arXiv Detail & Related papers (2023-07-17T12:42:56Z) - DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models [66.43179841884098]
We propose a novel image editing method, DragonDiffusion, enabling Drag-style manipulation on Diffusion models.
Our method achieves various editing modes for the generated or real images, such as object moving, object resizing, object appearance replacement, and content dragging.
arXiv Detail & Related papers (2023-07-05T16:43:56Z) - ProSpect: Prompt Spectrum for Attribute-Aware Personalization of
Diffusion Models [77.03361270726944]
Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models.
We propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information.
We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout.
arXiv Detail & Related papers (2023-05-25T16:32:01Z) - Uncovering the Disentanglement Capability in Text-to-Image Diffusion
Models [60.63556257324894]
A key desired property of image generative models is the ability to disentangle different attributes.
We propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation.
Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms.
arXiv Detail & Related papers (2022-12-16T19:58:52Z) - Diffusion Visual Counterfactual Explanations [51.077318228247925]
Visual Counterfactual Explanations (VCEs) are an important tool to understand the decisions of an image.
Current approaches for the generation of VCEs are restricted to adversarially robust models and often contain non-realistic artefacts.
In this paper, we overcome this by generating Visual Diffusion Counterfactual Explanations (DVCEs) for arbitrary ImageNet classifiers.
arXiv Detail & Related papers (2022-10-21T09:35:47Z) - Self-Guided Diffusion Models [53.825634944114285]
We propose a framework for self-guided diffusion models.
Our method provides guidance signals at various image granularities.
Our experiments on single-label and multi-label image datasets demonstrate that self-labeled guidance always outperforms diffusion models without guidance.
arXiv Detail & Related papers (2022-10-12T17:57:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.