Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence
- URL: http://arxiv.org/abs/2305.14334v2
- Date: Mon, 1 Apr 2024 19:16:24 GMT
- Title: Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence
- Authors: Grace Luo, Lisa Dunlap, Dong Huk Park, Aleksander Holynski, Trevor Darrell,
- Abstract summary: Diffusion Hyperfeatures is a framework for consolidating multi-scale and multi-timestep feature maps into per-pixel feature descriptors.
Our method achieves superior performance on the SPair-71k real image benchmark.
- Score: 88.00004819064672
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have been shown to be capable of generating high-quality images, suggesting that they could contain meaningful internal representations. Unfortunately, the feature maps that encode a diffusion model's internal information are spread not only over layers of the network, but also over diffusion timesteps, making it challenging to extract useful descriptors. We propose Diffusion Hyperfeatures, a framework for consolidating multi-scale and multi-timestep feature maps into per-pixel feature descriptors that can be used for downstream tasks. These descriptors can be extracted for both synthetic and real images using the generation and inversion processes. We evaluate the utility of our Diffusion Hyperfeatures on the task of semantic keypoint correspondence: our method achieves superior performance on the SPair-71k real image benchmark. We also demonstrate that our method is flexible and transferable: our feature aggregation network trained on the inversion features of real image pairs can be used on the generation features of synthetic image pairs with unseen objects and compositions. Our code is available at https://diffusion-hyperfeatures.github.io.
Related papers
- Retain, Blend, and Exchange: A Quality-aware Spatial-Stereo Fusion Approach for Event Stream Recognition [57.74076383449153]
We propose a novel dual-stream framework for event stream-based pattern recognition via differentiated fusion, termed EFV++.
It models two common event representations simultaneously, i.e., event images and event voxels.
We achieve new state-of-the-art performance on the Bullying10k dataset, i.e., $90.51%$, which exceeds the second place by $+2.21%$.
arXiv Detail & Related papers (2024-06-27T02:32:46Z) - Emergent Correspondence from Image Diffusion [56.29904609646015]
We show that correspondence emerges in image diffusion models without any explicit supervision.
We propose a strategy to extract this implicit knowledge out of diffusion networks as image features, namely DIffusion FeaTures (DIFT)
DIFT is able to outperform both weakly-supervised methods and competitive off-the-shelf features in identifying semantic, geometric, and temporal correspondences.
arXiv Detail & Related papers (2023-06-06T17:33:19Z) - DePF: A Novel Fusion Approach based on Decomposition Pooling for
Infrared and Visible Images [7.11574718614606]
A novel fusion network based on the decomposition pooling (de-pooling) manner is proposed, termed as DePF.
A de-pooling based encoder is designed to extract multi-scale image and detail features of source images at the same time.
The experimental results demonstrate that the proposed method exhibits superior fusion performance over the state-of-the-arts.
arXiv Detail & Related papers (2023-05-27T05:47:14Z) - Unsupervised Semantic Correspondence Using Stable Diffusion [27.355330079806027]
We show that one can leverage this semantic knowledge within diffusion models to find semantic correspondences.
We optimize the prompt embeddings of these models for maximum attention on the regions of interest.
We significantly outperform any existing weakly or unsupervised method on PF-Willow, CUB-200 and SPair-71k datasets.
arXiv Detail & Related papers (2023-05-24T21:34:34Z) - A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot
Semantic Correspondence [83.90531416914884]
We exploit Stable Diffusion features for semantic and dense correspondence.
With simple post-processing, SD features can perform quantitatively similar to SOTA representations.
We show that these correspondences can enable interesting applications such as instance swapping in two images.
arXiv Detail & Related papers (2023-05-24T16:59:26Z) - CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for
Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network.
We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z) - Generating Annotated High-Fidelity Images Containing Multiple Coherent
Objects [10.783993190686132]
We propose a multi-object generation framework that can synthesize images with multiple objects without explicitly requiring contextual information.
We demonstrate how coherency and fidelity are preserved with our method through experiments on the Multi-MNIST and CLEVR datasets.
arXiv Detail & Related papers (2020-06-22T11:33:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.