MagicMix: Semantic Mixing with Diffusion Models
- URL: http://arxiv.org/abs/2210.16056v1
- Date: Fri, 28 Oct 2022 11:07:48 GMT
- Title: MagicMix: Semantic Mixing with Diffusion Models
- Authors: Jun Hao Liew, Hanshu Yan, Daquan Zhou, Jiashi Feng
- Abstract summary: We explore a new task called semantic mixing, aiming at blending two different semantics to create a new concept.
We present MagicMix, a solution based on pre-trained text-conditioned diffusion models.
Our method does not require any spatial mask or re-training, yet is able to synthesize novel objects with high fidelity.
- Score: 85.43291162563652
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Have you ever imagined what a corgi-alike coffee machine or a tiger-alike
rabbit would look like? In this work, we attempt to answer these questions by
exploring a new task called semantic mixing, aiming at blending two different
semantics to create a new concept (e.g., corgi + coffee machine -- >
corgi-alike coffee machine). Unlike style transfer, where an image is stylized
according to the reference style without changing the image content, semantic
blending mixes two different concepts in a semantic manner to synthesize a
novel concept while preserving the spatial layout and geometry. To this end, we
present MagicMix, a simple yet effective solution based on pre-trained
text-conditioned diffusion models. Motivated by the progressive generation
property of diffusion models where layout/shape emerges at early denoising
steps while semantically meaningful details appear at later steps during the
denoising process, our method first obtains a coarse layout (either by
corrupting an image or denoising from a pure Gaussian noise given a text
prompt), followed by injection of conditional prompt for semantic mixing. Our
method does not require any spatial mask or re-training, yet is able to
synthesize novel objects with high fidelity. To improve the mixing quality, we
further devise two simple strategies to provide better control and flexibility
over the synthesized content. With our method, we present our results over
diverse downstream applications, including semantic style transfer, novel
object synthesis, breed mixing, and concept removal, demonstrating the
flexibility of our method. More results can be found on the project page
https://magicmix.github.io
Related papers
- Scaling Concept With Text-Guided Diffusion Models [53.80799139331966]
Instead of replacing a concept, can we enhance or suppress the concept itself?
We introduce ScalingConcept, a simple yet effective method to scale decomposed concepts up or down in real input without introducing new elements.
More importantly, ScalingConcept enables a variety of novel zero-shot applications across image and audio domains.
arXiv Detail & Related papers (2024-10-31T17:09:55Z) - SUMix: Mixup with Semantic and Uncertain Information [41.99721365685618]
Mixup data augmentation approaches have been applied for various tasks of deep learning.
We propose a novel approach named SUMix to learn the mixing ratio as well as the uncertainty for the mixed samples during the training process.
arXiv Detail & Related papers (2024-07-10T16:25:26Z) - DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models [18.44432223381586]
Recently, a number of image-mixing-based augmentation techniques have been introduced to improve the generalization of deep neural networks.
In these techniques, two or more randomly selected natural images are mixed together to generate an augmented image.
We propose DiffuseMix, a novel data augmentation technique that leverages a diffusion model to reshape training images.
arXiv Detail & Related papers (2024-04-05T05:31:02Z) - SpliceMix: A Cross-scale and Semantic Blending Augmentation Strategy for
Multi-label Image Classification [46.8141860303439]
We introduce a simple but effective augmentation strategy for multi-label image classification, namely SpliceMix.
The "splice" in our method is two-fold: 1) Each mixed image is a splice of several downsampled images in the form of a grid, where the semantics of images attending to mixing are blended without object deficiencies for alleviating co-occurred bias; 2) We splice mixed images and the original mini-batch to form a new SpliceMixed mini-batch, which allows an image with different scales to contribute to training together.
arXiv Detail & Related papers (2023-11-26T05:45:27Z) - Painterly Image Harmonization using Diffusion Model [17.732783922599857]
We propose a novel Painterly Harmonization stable Diffusion model (PHDiffusion)
It includes a lightweight adaptive encoder and a Dual Fusion (DEF) module.
Specifically, the adaptive encoder and the DEF module first stylize foreground features within each encoder.
Then, the stylized foreground features from both encoders are combined to guide the harmonization process.
arXiv Detail & Related papers (2023-08-04T09:51:57Z) - DiffuMask: Synthesizing Images with Pixel-level Annotations for Semantic
Segmentation Using Diffusion Models [68.21154597227165]
We show that it is possible to automatically obtain accurate semantic masks of synthetic images generated by the Off-the-shelf Stable Diffusion model.
Our approach, called DiffuMask, exploits the potential of the cross-attention map between text and image.
arXiv Detail & Related papers (2023-03-21T08:43:15Z) - SMMix: Self-Motivated Image Mixing for Vision Transformers [65.809376136455]
CutMix is a vital augmentation strategy that determines the performance and generalization ability of vision transformers (ViTs)
Existing CutMix variants tackle this problem by generating more consistent mixed images or more precise mixed labels.
We propose an efficient and effective Self-Motivated image Mixing method (SMMix) which motivates both image and label enhancement by the model under training itself.
arXiv Detail & Related papers (2022-12-26T00:19:39Z) - Uncovering the Disentanglement Capability in Text-to-Image Diffusion
Models [60.63556257324894]
A key desired property of image generative models is the ability to disentangle different attributes.
We propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation.
Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms.
arXiv Detail & Related papers (2022-12-16T19:58:52Z) - SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained
Data [124.95585891086894]
Proposal is called Semantically Proportional Mixing (SnapMix)
It exploits class activation map (CAM) to lessen the label noise in augmenting fine-grained data.
Our method consistently outperforms existing mixed-based approaches.
arXiv Detail & Related papers (2020-12-09T03:37:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.