The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization
- URL: http://arxiv.org/abs/2312.08872v4
- Date: Wed, 09 Oct 2024 03:29:17 GMT
- Title: The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization
- Authors: Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa,
- Abstract summary: We formulate the lottery ticket hypothesis in denoising randomly Gaussian noise images.
Winning tickets naturally tend to be denoised into specific content independently.
We implement semantic-driven initial image construction creating initial noise from known winning tickets.
- Score: 30.622943615086584
- License:
- Abstract: Text-to-image diffusion models allow users control over the content of generated images. Still, text-to-image generation occasionally leads to generation failure requiring users to generate dozens of images under the same text prompt before they obtain a satisfying result. We formulate the lottery ticket hypothesis in denoising: randomly initialized Gaussian noise images contain special pixel blocks (winning tickets) that naturally tend to be denoised into specific content independently. The generation failure in standard text-to-image synthesis is caused by the gap between optimal and actual spatial distribution of winning tickets in initial noisy images. To this end, we implement semantic-driven initial image construction creating initial noise from known winning tickets for each concept mentioned in the prompt. We conduct a series of experiments that verify the properties of winning tickets and demonstrate their generalizability across images and prompts. Our results show that aggregating winning tickets into the initial noise image effectively induce the model to generate the specified object at the corresponding location. Project Page: https://ut-mao.github.io/noise.github.io
Related papers
- Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation [43.48099716183503]
We propose a training-free approach tailored to diffusion-based image-to-image translation.
Our approach can be easily incorporated into existing image-to-image translation methods.
arXiv Detail & Related papers (2024-09-12T14:30:45Z) - The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise [92.53724347718173]
Diffusion models have achieved remarkable success in text-to-image generation tasks.
We identify specific regions within the initial noise image, termed trigger patches, that play a key role for object generation in the resulting images.
arXiv Detail & Related papers (2024-06-04T05:06:00Z) - InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization [27.508861002013358]
InitNO is a paradigm that refines the initial noise in semantically-faithful images.
A strategically crafted noise optimization pipeline is developed to guide the initial noise towards valid regions.
Our method, validated through rigorous experimentation, shows a commendable proficiency in generating images in strict accordance with text prompts.
arXiv Detail & Related papers (2024-04-06T14:56:59Z) - Towards Better Multi-modal Keyphrase Generation via Visual Entity
Enhancement and Multi-granularity Image Noise Filtering [79.44443231700201]
Multi-modal keyphrase generation aims to produce a set of keyphrases that represent the core points of the input text-image pair.
The input text and image are often not perfectly matched, and thus the image may introduce noise into the model.
We propose a novel multi-modal keyphrase generation model, which not only enriches the model input with external knowledge, but also effectively filters image noise.
arXiv Detail & Related papers (2023-09-09T09:41:36Z) - Guided Image Synthesis via Initial Image Editing in Diffusion Model [30.622943615086584]
Diffusion models can generate high quality images by denoising pure Gaussian noise images.
We propose a novel direction of manipulating the initial noise to control the generated image.
Our results highlight the flexibility and power of initial image manipulation in controlling the generated image.
arXiv Detail & Related papers (2023-05-05T09:27:59Z) - NLIP: Noise-robust Language-Image Pre-training [95.13287735264937]
We propose a principled Noise-robust Language-Image Pre-training framework (NLIP) to stabilize pre-training via two schemes: noise-harmonization and noise-completion.
Our NLIP can alleviate the common noise effects during image-text pre-training in a more efficient way.
arXiv Detail & Related papers (2022-12-14T08:19:30Z) - Learning to Generate Realistic Noisy Images via Pixel-level Noise-aware
Adversarial Training [50.018580462619425]
We propose a novel framework, namely Pixel-level Noise-aware Generative Adrial Network (PNGAN)
PNGAN employs a pre-trained real denoiser to map the fake and real noisy images into a nearly noise-free solution space.
For better noise fitting, we present an efficient architecture Simple Multi-versa-scale Network (SMNet) as the generator.
arXiv Detail & Related papers (2022-04-06T14:09:02Z) - Disentangling Noise from Images: A Flow-Based Image Denoising Neural
Network [25.008542061247383]
We propose a new perspective to treat image denoising as a distribution learning and disentangling task.
Since the noisy image distribution can be viewed as a joint distribution of clean images and noise, the denoised images can be obtained via manipulating the latent representations to the clean counterpart.
We present an invertible denoising network, FDN, without any assumptions on either clean or noise distributions.
arXiv Detail & Related papers (2021-05-11T01:52:26Z) - Neighbor2Neighbor: Self-Supervised Denoising from Single Noisy Images [98.82804259905478]
We present Neighbor2Neighbor to train an effective image denoising model with only noisy images.
In detail, input and target used to train a network are images sub-sampled from the same noisy image.
A denoising network is trained on sub-sampled training pairs generated in the first stage, with a proposed regularizer as additional loss for better performance.
arXiv Detail & Related papers (2021-01-08T02:03:25Z) - Dual Adversarial Network: Toward Real-world Noise Removal and Noise
Generation [52.75909685172843]
Real-world image noise removal is a long-standing yet very challenging task in computer vision.
We propose a novel unified framework to deal with the noise removal and noise generation tasks.
Our method learns the joint distribution of the clean-noisy image pairs.
arXiv Detail & Related papers (2020-07-12T09:16:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.