Diffusion Cocktail: Fused Generation from Diffusion Models
- URL: http://arxiv.org/abs/2312.08873v1
- Date: Tue, 12 Dec 2023 00:53:56 GMT
- Title: Diffusion Cocktail: Fused Generation from Diffusion Models
- Authors: Haoming Liu, Yuanhe Guo, Shengjie Wang, Hongyi Wen
- Abstract summary: Diffusion models excel at generating high-quality images and are easy to extend.
Recent work has focused on uncovering semantic and visual information encoded in various components of a diffusion model.
We propose Diffusion Cocktail (Ditail), a training-free method that can accurately transfer content information between two diffusion models.
- Score: 8.307057730976432
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Diffusion models excel at generating high-quality images and are easy to
extend, making them extremely popular among active users who have created an
extensive collection of diffusion models with various styles by fine-tuning
base models such as Stable Diffusion. Recent work has focused on uncovering
semantic and visual information encoded in various components of a diffusion
model, enabling better generation quality and more fine-grained control.
However, those methods target improving a single model and overlook the vastly
available collection of fine-tuned diffusion models. In this work, we study the
combinations of diffusion models. We propose Diffusion Cocktail (Ditail), a
training-free method that can accurately transfer content information between
two diffusion models. This allows us to perform diverse generations using a set
of diffusion models, resulting in novel images that are unlikely to be obtained
by a single model alone. We also explore utilizing Ditail for style transfer,
with the target style set by a diffusion model instead of an image. Ditail
offers a more detailed manipulation of the diffusion generation, thereby
enabling the vast community to integrate various styles and contents seamlessly
and generate any content of any style.
Related papers
- Unified Multimodal Discrete Diffusion [78.48930545306654]
Multimodal generative models that can understand and generate across multiple modalities are dominated by autoregressive (AR) approaches.
We explore discrete diffusion models as a unified generative formulation in the joint text and image domain.
We present the first Unified Multimodal Discrete Diffusion (UniDisc) model which is capable of jointly understanding and generating text and images.
arXiv Detail & Related papers (2025-03-26T17:59:51Z) - Diffusion Models For Multi-Modal Generative Modeling [32.61765315067488]
We propose a principled way to define a diffusion model by constructing a unified multi-modal diffusion model in a common diffusion space.
We propose several multimodal generation settings to verify our framework, including image transition, masked-image training, joint image-label and joint image-representation generative modeling.
arXiv Detail & Related papers (2024-07-24T18:04:17Z) - Training Class-Imbalanced Diffusion Model Via Overlap Optimization [55.96820607533968]
Diffusion models trained on real-world datasets often yield inferior fidelity for tail classes.
Deep generative models, including diffusion models, are biased towards classes with abundant training images.
We propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes.
arXiv Detail & Related papers (2024-02-16T16:47:21Z) - SODA: Bottleneck Diffusion Models for Representation Learning [75.7331354734152]
We introduce SODA, a self-supervised diffusion model, designed for representation learning.
The model incorporates an image encoder, which distills a source view into a compact representation, that guides the generation of related novel views.
We show that by imposing a tight bottleneck between the encoder and a denoising decoder, we can turn diffusion models into strong representation learners.
arXiv Detail & Related papers (2023-11-29T18:53:34Z) - Interactive Fashion Content Generation Using LLMs and Latent Diffusion
Models [0.0]
Fashionable image generation aims to synthesize images of diverse fashion prevalent around the globe.
We propose a method exploiting the equivalence between diffusion models and energy-based models (EBMs)
Our results indicate that using an LLM to refine the prompts to the latent diffusion model assists in generating globally creative and culturally diversified fashion styles.
arXiv Detail & Related papers (2023-05-15T18:38:25Z) - Extracting Training Data from Diffusion Models [77.11719063152027]
We show that diffusion models memorize individual images from their training data and emit them at generation time.
With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models.
We train hundreds of diffusion models in various settings to analyze how different modeling and data decisions affect privacy.
arXiv Detail & Related papers (2023-01-30T18:53:09Z) - Diffusion Art or Digital Forgery? Investigating Data Replication in
Diffusion Models [53.03978584040557]
We study image retrieval frameworks that enable us to compare generated images with training samples and detect when content has been replicated.
Applying our frameworks to diffusion models trained on multiple datasets including Oxford flowers, Celeb-A, ImageNet, and LAION, we discuss how factors such as training set size impact rates of content replication.
arXiv Detail & Related papers (2022-12-07T18:58:02Z) - Versatile Diffusion: Text, Images and Variations All in One Diffusion
Model [76.89932822375208]
Versatile Diffusion handles multiple flows of text-to-image, image-to-text, and variations in one unified model.
Our code and models are open-sourced at https://github.com/SHI-Labs/Versatile-Diffusion.
arXiv Detail & Related papers (2022-11-15T17:44:05Z) - A Survey on Generative Diffusion Model [75.93774014861978]
Diffusion models are an emerging class of deep generative models.
They have certain limitations, including a time-consuming iterative generation process and confinement to high-dimensional Euclidean space.
This survey presents a plethora of advanced techniques aimed at enhancing diffusion models.
arXiv Detail & Related papers (2022-09-06T16:56:21Z) - DiffusionCLIP: Text-guided Image Manipulation Using Diffusion Models [33.79188588182528]
We present a novel DiffusionCLIP which performs text-driven image manipulation with diffusion models using Contrastive Language-Image Pre-training (CLIP) loss.
Our method has a performance comparable to that of the modern GAN-based image processing methods for in and out-of-domain image processing tasks.
Our method can be easily used for various novel applications, enabling image translation from an unseen domain to another unseen domain or stroke-conditioned image generation in an unseen domain.
arXiv Detail & Related papers (2021-10-06T12:59:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.