Unsupervised Semantic Correspondence Using Stable Diffusion
- URL: http://arxiv.org/abs/2305.15581v1
- Date: Wed, 24 May 2023 21:34:34 GMT
- Title: Unsupervised Semantic Correspondence Using Stable Diffusion
- Authors: Eric Hedlin, Gopal Sharma, Shweta Mahajan, Hossam Isack, Abhishek Kar,
Andrea Tagliasacchi, Kwang Moo Yi
- Abstract summary: We show that one can leverage this semantic knowledge within diffusion models to find semantic correspondences.
We optimize the prompt embeddings of these models for maximum attention on the regions of interest.
We significantly outperform any existing weakly or unsupervised method on PF-Willow, CUB-200 and SPair-71k datasets.
- Score: 27.355330079806027
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image diffusion models are now capable of generating images that are
often indistinguishable from real images. To generate such images, these models
must understand the semantics of the objects they are asked to generate. In
this work we show that, without any training, one can leverage this semantic
knowledge within diffusion models to find semantic correspondences -- locations
in multiple images that have the same semantic meaning. Specifically, given an
image, we optimize the prompt embeddings of these models for maximum attention
on the regions of interest. These optimized embeddings capture semantic
information about the location, which can then be transferred to another image.
By doing so we obtain results on par with the strongly supervised state of the
art on the PF-Willow dataset and significantly outperform (20.9% relative for
the SPair-71k dataset) any existing weakly or unsupervised method on PF-Willow,
CUB-200 and SPair-71k datasets.
Related papers
- FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models [56.71672127740099]
We focus on the task of image segmentation, which is traditionally solved by training models on closed-vocabulary datasets.
We leverage different and relatively small-sized, open-source foundation models for zero-shot open-vocabulary segmentation.
Our approach (dubbed FreeSeg-Diff), which does not rely on any training, outperforms many training-based approaches on both Pascal VOC and COCO datasets.
arXiv Detail & Related papers (2024-03-29T10:38:25Z) - ZoDi: Zero-Shot Domain Adaptation with Diffusion-Based Image Transfer [13.956618446530559]
This paper proposes a zero-shot domain adaptation method based on diffusion models, called ZoDi.
First, we utilize an off-the-shelf diffusion model to synthesize target-like images by transferring the domain of source images to the target domain.
Secondly, we train the model using both source images and synthesized images with the original representations to learn domain-robust representations.
arXiv Detail & Related papers (2024-03-20T14:58:09Z) - EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models [52.3015009878545]
We develop an image segmentor capable of generating fine-grained segmentation maps without any additional training.
Our framework identifies semantic correspondences between image pixels and spatial locations of low-dimensional feature maps.
In extensive experiments, the produced segmentation maps are demonstrated to be well delineated and capture detailed parts of the images.
arXiv Detail & Related papers (2024-01-22T07:34:06Z) - Unlocking Pre-trained Image Backbones for Semantic Image Synthesis [29.688029979801577]
We propose a new class of GAN discriminators for semantic image synthesis that generates highly realistic images.
Our model, which we dub DP-SIMS, achieves state-of-the-art results in terms of image quality and consistency with the input label maps on ADE-20K, COCO-Stuff, and Cityscapes.
arXiv Detail & Related papers (2023-12-20T09:39:19Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - The Journey, Not the Destination: How Data Guides Diffusion Models [75.19694584942623]
Diffusion models trained on large datasets can synthesize photo-realistic images of remarkable quality and diversity.
We propose a framework that: (i) provides a formal notion of data attribution in the context of diffusion models, and (ii) allows us to counterfactually validate such attributions.
arXiv Detail & Related papers (2023-12-11T08:39:43Z) - Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence [88.00004819064672]
Diffusion Hyperfeatures is a framework for consolidating multi-scale and multi-timestep feature maps into per-pixel feature descriptors.
Our method achieves superior performance on the SPair-71k real image benchmark.
arXiv Detail & Related papers (2023-05-23T17:58:05Z) - Wavelet-based Unsupervised Label-to-Image Translation [9.339522647331334]
We propose a new Unsupervised paradigm for SIS (USIS) that makes use of a self-supervised segmentation loss and whole image wavelet based discrimination.
We test our methodology on 3 challenging datasets and demonstrate its ability to bridge the performance gap between paired and unpaired models.
arXiv Detail & Related papers (2023-05-16T17:48:44Z) - Few-shot Semantic Image Synthesis with Class Affinity Transfer [23.471210664024067]
We propose a transfer method that leverages a model trained on a large source dataset to improve the learning ability on small target datasets.
The class affinity matrix is introduced as a first layer to the source model to make it compatible with the target label maps.
We apply our approach to GAN-based and diffusion-based architectures for semantic synthesis.
arXiv Detail & Related papers (2023-04-05T09:24:45Z) - Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images [60.34381768479834]
Recent advancements in diffusion models have enabled the generation of realistic deepfakes from textual prompts in natural language.
We pioneer a systematic study on deepfake detection generated by state-of-the-art diffusion models.
arXiv Detail & Related papers (2023-04-02T10:25:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.