Related papers: Guided Image Synthesis via Initial Image Editing in Diffusion Model

Guided Image Synthesis via Initial Image Editing in Diffusion Model

URL: http://arxiv.org/abs/2305.03382v3
Date: Wed, 09 Oct 2024 03:31:44 GMT
Title: Guided Image Synthesis via Initial Image Editing in Diffusion Model
Authors: Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa,
Abstract summary: Diffusion models can generate high quality images by denoising pure Gaussian noise images. We propose a novel direction of manipulating the initial noise to control the generated image. Our results highlight the flexibility and power of initial image manipulation in controlling the generated image.
Score: 30.622943615086584
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models have the ability to generate high quality images by denoising pure Gaussian noise images. While previous research has primarily focused on improving the control of image generation through adjusting the denoising process, we propose a novel direction of manipulating the initial noise to control the generated image. Through experiments on stable diffusion, we show that blocks of pixels in the initial latent images have a preference for generating specific content, and that modifying these blocks can significantly influence the generated image. In particular, we show that modifying a part of the initial image affects the corresponding region of the generated image while leaving other regions unaffected, which is useful for repainting tasks. Furthermore, we find that the generation preferences of pixel blocks are primarily determined by their values, rather than their position. By moving pixel blocks with a tendency to generate user-desired content to user-specified regions, our approach achieves state-of-the-art performance in layout-to-image generation. Our results highlight the flexibility and power of initial image manipulation in controlling the generated image. Project Page: https://ut-mao.github.io/swap.github.io/

Related papers

IntrinsiX: High-Quality PBR Generation using Image Priors [49.90007540430264]
We introduce IntrinsiX, a novel method that generates high-quality intrinsic images from text description. In contrast to existing text-to-image models whose outputs contain baked-in scene lighting, our approach predicts physically-based rendering (PBR) maps.
arXiv Detail & Related papers (2025-04-01T17:47:48Z)
TKG-DM: Training-free Chroma Key Content Generation Diffusion Model [9.939293311550655]
Training-Free Chroma Key Content Generation Diffusion Model (TKG-DM) We present a novel Training-Free Chroma Key Content Generation Diffusion Model (TKG-DM) Our proposed method is the first to explore the manipulation of the color aspects in initial noise for controlled background generation.
arXiv Detail & Related papers (2024-11-23T15:07:15Z)
Beyond Image Prior: Embedding Noise Prior into Conditional Denoising Transformer [17.430622649002427]
Existing learning-based denoising methods typically train models to generalize the image prior from large-scale datasets. We propose a new perspective on the denoising challenge by highlighting the distinct separation between noise and image priors. We introduce a Locally Noise Prior Estimation algorithm, which accurately estimates the noise prior directly from a single raw noisy image.
arXiv Detail & Related papers (2024-07-12T08:43:11Z)
Tuning-Free Noise Rectification for High Fidelity Image-to-Video Generation [23.81997037880116]
Image-to-video (I2V) generation tasks always suffer from keeping high fidelity in the open domains. Several recent I2V frameworks can generate dynamic content for open domain images but fail to maintain fidelity. We propose an effective method that can be applied to mainstream video diffusion models.
arXiv Detail & Related papers (2024-03-05T09:57:47Z)
Pixel-Inconsistency Modeling for Image Manipulation Localization [59.968362815126326]
Digital image forensics plays a crucial role in image authentication and manipulation localization. This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts. Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z)
iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing. It generates images conditioned on a source image and a textual edit prompt. It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z)
Gradient Adjusting Networks for Domain Inversion [82.72289618025084]
StyleGAN2 was demonstrated to be a powerful image generation engine that supports semantic editing. We present a per-image optimization method that tunes a StyleGAN2 generator such that it achieves a local edit to the generator's weights. Our experiments show a sizable gap in performance over the current state of the art in this very active domain.
arXiv Detail & Related papers (2023-02-22T14:47:57Z)
Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models [60.63556257324894]
A key desired property of image generative models is the ability to disentangle different attributes. We propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation. Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms.
arXiv Detail & Related papers (2022-12-16T19:58:52Z)
Ensembling with Deep Generative Views [72.70801582346344]
generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose. Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification. We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars.
arXiv Detail & Related papers (2021-04-29T17:58:35Z)
Learning Spatial and Spatio-Temporal Pixel Aggregations for Image and Video Denoising [104.59305271099967]
We present a pixel aggregation network and learn the pixel sampling and averaging strategies for image denoising. We develop a pixel aggregation network for video denoising to sample pixels across the spatial-temporal space. Our method is able to solve the misalignment issues caused by large motion in dynamic scenes.
arXiv Detail & Related papers (2021-01-26T13:00:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.