Diffusion Cocktail: Mixing Domain-Specific Diffusion Models for Diversified Image Generations
- URL: http://arxiv.org/abs/2312.08873v2
- Date: Mon, 9 Sep 2024 02:48:03 GMT
- Title: Diffusion Cocktail: Mixing Domain-Specific Diffusion Models for Diversified Image Generations
- Authors: Haoming Liu, Yuanhe Guo, Shengjie Wang, Hongyi Wen,
- Abstract summary: Diffusion Cocktail (Ditail) is a training-free method that transfers style and content information between multiple diffusion models.
Ditail offers fine-grained control of the generation process, which enables flexible manipulations of styles and contents.
- Score: 7.604214200457584
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Diffusion models, capable of high-quality image generation, receive unparalleled popularity for their ease of extension. Active users have created a massive collection of domain-specific diffusion models by fine-tuning base models on self-collected datasets. Recent work has focused on improving a single diffusion model by uncovering semantic and visual information encoded in various architecture components. However, those methods overlook the vastly available set of fine-tuned diffusion models and, therefore, miss the opportunity to utilize their combined capacity for novel generation. In this work, we propose Diffusion Cocktail (Ditail), a training-free method that transfers style and content information between multiple diffusion models. This allows us to perform diversified generations using a set of diffusion models, resulting in novel images unobtainable by a single model. Ditail also offers fine-grained control of the generation process, which enables flexible manipulations of styles and contents. With these properties, Ditail excels in numerous applications, including style transfer guided by diffusion models, novel-style image generation, and image manipulation via prompts or collage inputs.
Related papers
- Diffusion Models For Multi-Modal Generative Modeling [32.61765315067488]
We propose a principled way to define a diffusion model by constructing a unified multi-modal diffusion model in a common diffusion space.
We propose several multimodal generation settings to verify our framework, including image transition, masked-image training, joint image-label and joint image-representation generative modeling.
arXiv Detail & Related papers (2024-07-24T18:04:17Z) - Training Class-Imbalanced Diffusion Model Via Overlap Optimization [55.96820607533968]
Diffusion models trained on real-world datasets often yield inferior fidelity for tail classes.
Deep generative models, including diffusion models, are biased towards classes with abundant training images.
We propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes.
arXiv Detail & Related papers (2024-02-16T16:47:21Z) - SODA: Bottleneck Diffusion Models for Representation Learning [75.7331354734152]
We introduce SODA, a self-supervised diffusion model, designed for representation learning.
The model incorporates an image encoder, which distills a source view into a compact representation, that guides the generation of related novel views.
We show that by imposing a tight bottleneck between the encoder and a denoising decoder, we can turn diffusion models into strong representation learners.
arXiv Detail & Related papers (2023-11-29T18:53:34Z) - Interactive Fashion Content Generation Using LLMs and Latent Diffusion
Models [0.0]
Fashionable image generation aims to synthesize images of diverse fashion prevalent around the globe.
We propose a method exploiting the equivalence between diffusion models and energy-based models (EBMs)
Our results indicate that using an LLM to refine the prompts to the latent diffusion model assists in generating globally creative and culturally diversified fashion styles.
arXiv Detail & Related papers (2023-05-15T18:38:25Z) - Extracting Training Data from Diffusion Models [77.11719063152027]
We show that diffusion models memorize individual images from their training data and emit them at generation time.
With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models.
We train hundreds of diffusion models in various settings to analyze how different modeling and data decisions affect privacy.
arXiv Detail & Related papers (2023-01-30T18:53:09Z) - Diffusion Art or Digital Forgery? Investigating Data Replication in
Diffusion Models [53.03978584040557]
We study image retrieval frameworks that enable us to compare generated images with training samples and detect when content has been replicated.
Applying our frameworks to diffusion models trained on multiple datasets including Oxford flowers, Celeb-A, ImageNet, and LAION, we discuss how factors such as training set size impact rates of content replication.
arXiv Detail & Related papers (2022-12-07T18:58:02Z) - Versatile Diffusion: Text, Images and Variations All in One Diffusion
Model [76.89932822375208]
Versatile Diffusion handles multiple flows of text-to-image, image-to-text, and variations in one unified model.
Our code and models are open-sourced at https://github.com/SHI-Labs/Versatile-Diffusion.
arXiv Detail & Related papers (2022-11-15T17:44:05Z) - A Survey on Generative Diffusion Model [75.93774014861978]
Diffusion models are an emerging class of deep generative models.
They have certain limitations, including a time-consuming iterative generation process and confinement to high-dimensional Euclidean space.
This survey presents a plethora of advanced techniques aimed at enhancing diffusion models.
arXiv Detail & Related papers (2022-09-06T16:56:21Z) - DiffusionCLIP: Text-guided Image Manipulation Using Diffusion Models [33.79188588182528]
We present a novel DiffusionCLIP which performs text-driven image manipulation with diffusion models using Contrastive Language-Image Pre-training (CLIP) loss.
Our method has a performance comparable to that of the modern GAN-based image processing methods for in and out-of-domain image processing tasks.
Our method can be easily used for various novel applications, enabling image translation from an unseen domain to another unseen domain or stroke-conditioned image generation in an unseen domain.
arXiv Detail & Related papers (2021-10-06T12:59:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.