TKG-DM: Training-free Chroma Key Content Generation Diffusion Model
- URL: http://arxiv.org/abs/2411.15580v1
- Date: Sat, 23 Nov 2024 15:07:15 GMT
- Title: TKG-DM: Training-free Chroma Key Content Generation Diffusion Model
- Authors: Ryugo Morita, Stanislav Frolov, Brian Bernhard Moser, Takahiro Shirakawa, Ko Watanabe, Andreas Dengel, Jinjia Zhou,
- Abstract summary: Training-Free Chroma Key Content Generation Diffusion Model (TKG-DM)
We present a novel Training-Free Chroma Key Content Generation Diffusion Model (TKG-DM)
Our proposed method is the first to explore the manipulation of the color aspects in initial noise for controlled background generation.
- Score: 9.939293311550655
- License:
- Abstract: Diffusion models have enabled the generation of high-quality images with a strong focus on realism and textual fidelity. Yet, large-scale text-to-image models, such as Stable Diffusion, struggle to generate images where foreground objects are placed over a chroma key background, limiting their ability to separate foreground and background elements without fine-tuning. To address this limitation, we present a novel Training-Free Chroma Key Content Generation Diffusion Model (TKG-DM), which optimizes the initial random noise to produce images with foreground objects on a specifiable color background. Our proposed method is the first to explore the manipulation of the color aspects in initial noise for controlled background generation, enabling precise separation of foreground and background without fine-tuning. Extensive experiments demonstrate that our training-free method outperforms existing methods in both qualitative and quantitative evaluations, matching or surpassing fine-tuned models. Finally, we successfully extend it to other tasks (e.g., consistency models and text-to-video), highlighting its transformative potential across various generative applications where independent control of foreground and background is crucial.
Related papers
- Zero-Shot Low Light Image Enhancement with Diffusion Prior [2.102429358229889]
We introduce a novel zero-shot method for controlling and refining the generative behavior of diffusion models for dark-to-light image conversion tasks.
Our method demonstrates superior performance over existing state-of-the-art methods in the task of low-light image enhancement.
arXiv Detail & Related papers (2024-12-18T00:31:18Z) - FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process [120.91393949012014]
FreeEnhance is a framework for content-consistent image enhancement using off-the-shelf image diffusion models.
In the noising stage, FreeEnhance is devised to add lighter noise to the region with higher frequency to preserve the high-frequent patterns in the original image.
In the denoising stage, we present three target properties as constraints to regularize the predicted noise, enhancing images with high acutance and high visual quality.
arXiv Detail & Related papers (2024-09-11T17:58:50Z) - ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - NM-FlowGAN: Modeling sRGB Noise without Paired Images using a Hybrid Approach of Normalizing Flows and GAN [9.81778202920426]
NM-FlowGAN is a hybrid approach that exploits the strengths of both GAN and Normalizing Flows.
Our method synthesizes noise using clean images and factors that affect noise characteristics, such as easily obtainable parameters like camera type and ISO settings.
In our experiments, our NM-FlowGAN outperforms other baselines in the sRGB noise synthesis task.
arXiv Detail & Related papers (2023-12-15T09:09:25Z) - DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators [56.994967294931286]
We introduce DreamDrone, a novel zero-shot and training-free pipeline for generating flythrough scenes from textual prompts.
We advocate explicitly warping the intermediate latent code of the pre-trained text-to-image diffusion model for high-quality image generation and unbounded generalization ability.
arXiv Detail & Related papers (2023-12-14T08:42:26Z) - FreePIH: Training-Free Painterly Image Harmonization with Diffusion
Model [19.170302996189335]
Our FreePIH method tames the denoising process as a plug-in module for foreground image style transfer.
We make use of multi-scale features to enforce the consistency of the content and stability of the foreground objects in the latent space.
Our method can surpass representative baselines by large margins.
arXiv Detail & Related papers (2023-11-25T04:23:49Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Guided Image Synthesis via Initial Image Editing in Diffusion Model [30.622943615086584]
Diffusion models can generate high quality images by denoising pure Gaussian noise images.
We propose a novel direction of manipulating the initial noise to control the generated image.
Our results highlight the flexibility and power of initial image manipulation in controlling the generated image.
arXiv Detail & Related papers (2023-05-05T09:27:59Z) - SSH: A Self-Supervised Framework for Image Harmonization [97.16345684998788]
We propose a novel Self-Supervised Harmonization framework (SSH) that can be trained using just "free" natural images without being edited.
Our results show that the proposedSSH outperforms previous state-of-the-art methods in terms of reference metrics, visual quality, and subject user study.
arXiv Detail & Related papers (2021-08-15T19:51:33Z) - Co-occurrence Background Model with Superpixels for Robust Background
Initialization [10.955692396874678]
We develop a co-occurrence background model with superpixel segmentation.
Results obtained from the dataset of the challenging benchmark(SBMnet)validate it's performance under various challenges.
arXiv Detail & Related papers (2020-03-29T02:48:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.