Guided Image Synthesis via Initial Image Editing in Diffusion Model
- URL: http://arxiv.org/abs/2305.03382v2
- Date: Sun, 6 Aug 2023 04:56:14 GMT
- Title: Guided Image Synthesis via Initial Image Editing in Diffusion Model
- Authors: Jiafeng Mao, Xueting Wang and Kiyoharu Aizawa
- Abstract summary: Diffusion models can generate high quality images by denoising pure Gaussian noise images.
We propose a novel direction of manipulating the initial noise to control the generated image.
Our results highlight the flexibility and power of initial image manipulation in controlling the generated image.
- Score: 43.14135590548668
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have the ability to generate high quality images by
denoising pure Gaussian noise images. While previous research has primarily
focused on improving the control of image generation through adjusting the
denoising process, we propose a novel direction of manipulating the initial
noise to control the generated image. Through experiments on stable diffusion,
we show that blocks of pixels in the initial latent images have a preference
for generating specific content, and that modifying these blocks can
significantly influence the generated image. In particular, we show that
modifying a part of the initial image affects the corresponding region of the
generated image while leaving other regions unaffected, which is useful for
repainting tasks. Furthermore, we find that the generation preferences of pixel
blocks are primarily determined by their values, rather than their position. By
moving pixel blocks with a tendency to generate user-desired content to
user-specified regions, our approach achieves state-of-the-art performance in
layout-to-image generation. Our results highlight the flexibility and power of
initial image manipulation in controlling the generated image.
Related papers
- Beyond Image Prior: Embedding Noise Prior into Conditional Denoising Transformer [17.430622649002427]
Existing learning-based denoising methods typically train models to generalize the image prior from large-scale datasets.
We propose a new perspective on the denoising challenge by highlighting the distinct separation between noise and image priors.
We introduce a Locally Noise Prior Estimation algorithm, which accurately estimates the noise prior directly from a single raw noisy image.
arXiv Detail & Related papers (2024-07-12T08:43:11Z) - Active Generation for Image Classification [50.18107721267218]
We propose to address the efficiency of image generation by focusing on the specific needs and characteristics of the model.
With a central tenet of active learning, our method, named ActGen, takes a training-aware approach to image generation.
arXiv Detail & Related papers (2024-03-11T08:45:31Z) - Tuning-Free Noise Rectification for High Fidelity Image-to-Video
Generation [23.81997037880116]
Image-to-video (I2V) generation tasks always suffer from keeping high fidelity in the open domains.
Several recent I2V frameworks can generate dynamic content for open domain images but fail to maintain fidelity.
We propose an effective method that can be applied to mainstream video diffusion models.
arXiv Detail & Related papers (2024-03-05T09:57:47Z) - The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization [30.622943615086584]
We formulate the lottery ticket hypothesis in denoising randomly Gaussian noise images.
We implement semantic-driven initial image construction creating initial noise from known winning tickets.
Our results show that aggregating winning tickets into the initial noise image effectively induce the model to generate the specified object at the corresponding location.
arXiv Detail & Related papers (2023-12-13T03:31:19Z) - Pixel-Inconsistency Modeling for Image Manipulation Localization [63.54342601757723]
Digital image forensics plays a crucial role in image authentication and manipulation localization.
This paper presents a generalized and robust manipulation localization model through the analysis of pixel inconsistency artifacts.
Experiments show that our method successfully extracts inherent pixel-inconsistency forgery fingerprints.
arXiv Detail & Related papers (2023-09-30T02:54:51Z) - Diffusion Brush: A Latent Diffusion Model-based Editing Tool for
AI-generated Images [10.323260768204461]
Text-to-image generative models have made remarkable advancements in generating high-quality images.
Existing techniques to fine-tune generated images are time-consuming (manual editing), produce poorly-integrated results (inpainting), or result in unexpected changes across the entire image.
We present Diffusion Brush, a Latent Diffusion Model-based (LDM) tool to efficiently fine-tune desired regions within an AI-synthesized image.
arXiv Detail & Related papers (2023-05-31T22:27:21Z) - Ensembling with Deep Generative Views [72.70801582346344]
generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose.
Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification.
We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars.
arXiv Detail & Related papers (2021-04-29T17:58:35Z) - Learning Spatial and Spatio-Temporal Pixel Aggregations for Image and
Video Denoising [104.59305271099967]
We present a pixel aggregation network and learn the pixel sampling and averaging strategies for image denoising.
We develop a pixel aggregation network for video denoising to sample pixels across the spatial-temporal space.
Our method is able to solve the misalignment issues caused by large motion in dynamic scenes.
arXiv Detail & Related papers (2021-01-26T13:00:46Z) - Blur, Noise, and Compression Robust Generative Adversarial Networks [85.68632778835253]
We propose blur, noise, and compression robust GAN (BNCR-GAN) to learn a clean image generator directly from degraded images.
Inspired by NR-GAN, BNCR-GAN uses a multiple-generator model composed of image, blur- Kernel, noise, and quality-factor generators.
We demonstrate the effectiveness of BNCR-GAN through large-scale comparative studies on CIFAR-10 and a generality analysis on FFHQ.
arXiv Detail & Related papers (2020-03-17T17:56:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.