TKG-DM: Training-free Chroma Key Content Generation Diffusion Model
- URL: http://arxiv.org/abs/2411.15580v1
- Date: Sat, 23 Nov 2024 15:07:15 GMT
- Title: TKG-DM: Training-free Chroma Key Content Generation Diffusion Model
- Authors: Ryugo Morita, Stanislav Frolov, Brian Bernhard Moser, Takahiro Shirakawa, Ko Watanabe, Andreas Dengel, Jinjia Zhou,
- Abstract summary: Training-Free Chroma Key Content Generation Diffusion Model (TKG-DM)
We present a novel Training-Free Chroma Key Content Generation Diffusion Model (TKG-DM)
Our proposed method is the first to explore the manipulation of the color aspects in initial noise for controlled background generation.
- Score: 9.939293311550655
- License:
- Abstract: Diffusion models have enabled the generation of high-quality images with a strong focus on realism and textual fidelity. Yet, large-scale text-to-image models, such as Stable Diffusion, struggle to generate images where foreground objects are placed over a chroma key background, limiting their ability to separate foreground and background elements without fine-tuning. To address this limitation, we present a novel Training-Free Chroma Key Content Generation Diffusion Model (TKG-DM), which optimizes the initial random noise to produce images with foreground objects on a specifiable color background. Our proposed method is the first to explore the manipulation of the color aspects in initial noise for controlled background generation, enabling precise separation of foreground and background without fine-tuning. Extensive experiments demonstrate that our training-free method outperforms existing methods in both qualitative and quantitative evaluations, matching or surpassing fine-tuned models. Finally, we successfully extend it to other tasks (e.g., consistency models and text-to-video), highlighting its transformative potential across various generative applications where independent control of foreground and background is crucial.
Related papers
- HYPNOS : Highly Precise Foreground-focused Diffusion Finetuning for Inanimate Objects [1.706656684496508]
A robust diffusion model is determined by its ability to perform near-perfect reconstruction of certain product outcomes.
The current prominent diffusion-based finetuning technique falls short in maintaining the foreground object consistency.
We propose Hypnos, a highly precise foreground-focused diffusion finetuning technique.
arXiv Detail & Related papers (2024-10-18T08:20:37Z) - FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process [120.91393949012014]
FreeEnhance is a framework for content-consistent image enhancement using off-the-shelf image diffusion models.
In the noising stage, FreeEnhance is devised to add lighter noise to the region with higher frequency to preserve the high-frequent patterns in the original image.
In the denoising stage, we present three target properties as constraints to regularize the predicted noise, enhancing images with high acutance and high visual quality.
arXiv Detail & Related papers (2024-09-11T17:58:50Z) - Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - NM-FlowGAN: Modeling sRGB Noise without Paired Images using a Hybrid Approach of Normalizing Flows and GAN [9.81778202920426]
NM-FlowGAN is a hybrid approach that exploits the strengths of both GAN and Normalizing Flows.
Our method synthesizes noise using clean images and factors that affect noise characteristics, such as easily obtainable parameters like camera type and ISO settings.
In our experiments, our NM-FlowGAN outperforms other baselines in the sRGB noise synthesis task.
arXiv Detail & Related papers (2023-12-15T09:09:25Z) - DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators [56.994967294931286]
We introduce DreamDrone, a novel zero-shot and training-free pipeline for generating flythrough scenes from textual prompts.
We advocate explicitly warping the intermediate latent code of the pre-trained text-to-image diffusion model for high-quality image generation and unbounded generalization ability.
arXiv Detail & Related papers (2023-12-14T08:42:26Z) - Image Inpainting via Tractable Steering of Diffusion Models [54.13818673257381]
This paper proposes to exploit the ability of Tractable Probabilistic Models (TPMs) to exactly and efficiently compute the constrained posterior.
Specifically, this paper adopts a class of expressive TPMs termed Probabilistic Circuits (PCs)
We show that our approach can consistently improve the overall quality and semantic coherence of inpainted images with only 10% additional computational overhead.
arXiv Detail & Related papers (2023-11-28T21:14:02Z) - FreePIH: Training-Free Painterly Image Harmonization with Diffusion
Model [19.170302996189335]
Our FreePIH method tames the denoising process as a plug-in module for foreground image style transfer.
We make use of multi-scale features to enforce the consistency of the content and stability of the foreground objects in the latent space.
Our method can surpass representative baselines by large margins.
arXiv Detail & Related papers (2023-11-25T04:23:49Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Guided Image Synthesis via Initial Image Editing in Diffusion Model [30.622943615086584]
Diffusion models can generate high quality images by denoising pure Gaussian noise images.
We propose a novel direction of manipulating the initial noise to control the generated image.
Our results highlight the flexibility and power of initial image manipulation in controlling the generated image.
arXiv Detail & Related papers (2023-05-05T09:27:59Z) - SSH: A Self-Supervised Framework for Image Harmonization [97.16345684998788]
We propose a novel Self-Supervised Harmonization framework (SSH) that can be trained using just "free" natural images without being edited.
Our results show that the proposedSSH outperforms previous state-of-the-art methods in terms of reference metrics, visual quality, and subject user study.
arXiv Detail & Related papers (2021-08-15T19:51:33Z) - Co-occurrence Background Model with Superpixels for Robust Background
Initialization [10.955692396874678]
We develop a co-occurrence background model with superpixel segmentation.
Results obtained from the dataset of the challenging benchmark(SBMnet)validate it's performance under various challenges.
arXiv Detail & Related papers (2020-03-29T02:48:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.