Related papers: Training-free Content Injection using h-space in Diffusion Models

Training-free Content Injection using h-space in Diffusion Models

URL: http://arxiv.org/abs/2303.15403v2
Date: Thu, 4 Jan 2024 09:23:07 GMT
Title: Training-free Content Injection using h-space in Diffusion Models
Authors: Jaeseok Jeong, Mingi Kwon, Youngjung Uh
Abstract summary: In this paper, we introduce a method to inject the content of one image into another image by combining their features in the generative processes. Unlike custom-diffusion approaches, our method does not require time-consuming optimization or fine-tuning.
Score: 16.51521884698886
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Diffusion models (DMs) synthesize high-quality images in various domains. However, controlling their generative process is still hazy because the intermediate variables in the process are not rigorously studied. Recently, the bottleneck feature of the U-Net, namely $h$-space, is found to convey the semantics of the resulting image. It enables StyleCLIP-like latent editing within DMs. In this paper, we explore further usage of $h$-space beyond attribute editing, and introduce a method to inject the content of one image into another image by combining their features in the generative processes. Briefly, given the original generative process of the other image, 1) we gradually blend the bottleneck feature of the content with proper normalization, and 2) we calibrate the skip connections to match the injected content. Unlike custom-diffusion approaches, our method does not require time-consuming optimization or fine-tuning. Instead, our method manipulates intermediate features within a feed-forward generative process. Furthermore, our method does not require supervision from external networks. The code is available at https://curryjung.github.io/InjectFusion/

Related papers

Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding [84.3224556294803]
Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences. We aim to optimize downstream reward functions while preserving the naturalness of these design spaces. Our algorithm integrates soft value functions, which looks ahead to how intermediate noisy states lead to high rewards in the future.
arXiv Detail & Related papers (2024-08-15T16:47:59Z)
RecDiffusion: Rectangling for Image Stitching with Diffusion Models [53.824503710254206]
We introduce a novel diffusion-based learning framework, textbfRecDiffusion, for image stitching rectangling. This framework combines Motion Diffusion Models (MDM) to generate motion fields, effectively transitioning from the stitched image's irregular borders to a geometrically corrected intermediary.
arXiv Detail & Related papers (2024-03-28T06:22:45Z)
IIDM: Image-to-Image Diffusion Model for Semantic Image Synthesis [8.080248399002663]
In this paper, semantic image synthesis is treated as an image denoising task. The style reference is first contaminated with random noise and then progressively denoised by IIDM. Three techniques, refinement, color-transfer and model ensembles are proposed to further boost the generation quality.
arXiv Detail & Related papers (2024-03-20T08:21:00Z)
IterInv: Iterative Inversion for Pixel-Level T2I Models [16.230193725587807]
DDIM inversion is a prevalent practice rooted in Latent Diffusion Models (LDM) Large pretrained T2I models working on the latent space suffer from losing details due to the first compression stage with an autoencoder mechanism. We develop an iterative inversion (IterInv) technique for this category of T2I models and verify IterInv with the open-source DeepFloyd-IF model.
arXiv Detail & Related papers (2023-10-30T13:47:46Z)
Not All Steps are Created Equal: Selective Diffusion Distillation for Image Manipulation [23.39614544877529]
Conditional diffusion models have demonstrated impressive performance in image manipulation tasks. Adding too much noise affects the fidelity of the image while adding too little affects its editability. We propose a novel framework, Diffusion Selective Distillation (SDD), that ensures both the fidelity and editability of images.
arXiv Detail & Related papers (2023-07-17T12:42:56Z)
DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models [66.43179841884098]
We propose a novel image editing method, DragonDiffusion, enabling Drag-style manipulation on Diffusion models. Our method achieves various editing modes for the generated or real images, such as object moving, object resizing, object appearance replacement, and content dragging.
arXiv Detail & Related papers (2023-07-05T16:43:56Z)
ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models [77.03361270726944]
Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models. We propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information. We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout.
arXiv Detail & Related papers (2023-05-25T16:32:01Z)
Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models [60.63556257324894]
A key desired property of image generative models is the ability to disentangle different attributes. We propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation. Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms.
arXiv Detail & Related papers (2022-12-16T19:58:52Z)
Diffusion Visual Counterfactual Explanations [51.077318228247925]
Visual Counterfactual Explanations (VCEs) are an important tool to understand the decisions of an image. Current approaches for the generation of VCEs are restricted to adversarially robust models and often contain non-realistic artefacts. In this paper, we overcome this by generating Visual Diffusion Counterfactual Explanations (DVCEs) for arbitrary ImageNet classifiers.
arXiv Detail & Related papers (2022-10-21T09:35:47Z)
Self-Guided Diffusion Models [53.825634944114285]
We propose a framework for self-guided diffusion models. Our method provides guidance signals at various image granularities. Our experiments on single-label and multi-label image datasets demonstrate that self-labeled guidance always outperforms diffusion models without guidance.
arXiv Detail & Related papers (2022-10-12T17:57:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.