Learning Contrastive Representation for Semantic Correspondence
- URL: http://arxiv.org/abs/2109.10967v1
- Date: Wed, 22 Sep 2021 18:34:14 GMT
- Title: Learning Contrastive Representation for Semantic Correspondence
- Authors: Taihong Xiao, Sifei Liu, Shalini De Mello, Zhiding Yu, Jan Kautz,
Ming-Hsuan Yang
- Abstract summary: We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
- Score: 150.29135856909477
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Dense correspondence across semantically related images has been extensively
studied, but still faces two challenges: 1) large variations in appearance,
scale and pose exist even for objects from the same category, and 2) labeling
pixel-level dense correspondences is labor intensive and infeasible to scale.
Most existing approaches focus on designing various matching approaches with
fully-supervised ImageNet pretrained networks. On the other hand, while a
variety of self-supervised approaches are proposed to explicitly measure
image-level similarities, correspondence matching the pixel level remains
under-explored. In this work, we propose a multi-level contrastive learning
approach for semantic matching, which does not rely on any ImageNet pretrained
model. We show that image-level contrastive learning is a key component to
encourage the convolutional features to find correspondence between similar
objects, while the performance can be further enhanced by regularizing
cross-instance cycle-consistency at intermediate feature levels. Experimental
results on the PF-PASCAL, PF-WILLOW, and SPair-71k benchmark datasets
demonstrate that our method performs favorably against the state-of-the-art
approaches. The source code and trained models will be made available to the
public.
Related papers
- Dual-Level Cross-Modal Contrastive Clustering [4.083185193413678]
We propose a novel image clustering framwork, named Dual-level Cross-Modal Contrastive Clustering (DXMC)
external textual information is introduced for constructing a semantic space which is adopted to generate image-text pairs.
The image-text pairs are respectively sent to pre-trained image and text encoder to obtain image and text embeddings which subsquently are fed into four well-designed networks.
arXiv Detail & Related papers (2024-09-06T18:49:45Z) - Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model [80.61157097223058]
A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models.
In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques.
We introduce an innovative inter-class data augmentation method known as Diff-Mix, which enriches the dataset by performing image translations between classes.
arXiv Detail & Related papers (2024-03-28T17:23:45Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Introspective Deep Metric Learning for Image Retrieval [80.29866561553483]
We argue that a good similarity model should consider the semantic discrepancies with caution to better deal with ambiguous images for more robust training.
We propose to represent an image using not only a semantic embedding but also an accompanying uncertainty embedding, which describes the semantic characteristics and ambiguity of an image, respectively.
The proposed IDML framework improves the performance of deep metric learning through uncertainty modeling and attains state-of-the-art results on the widely used CUB-200-2011, Cars196, and Stanford Online Products datasets.
arXiv Detail & Related papers (2022-05-09T17:51:44Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - Cross-View-Prediction: Exploring Contrastive Feature for Hyperspectral
Image Classification [9.131465469247608]
This paper presents a self-supervised feature learning method for hyperspectral image classification.
Our method tries to construct two different views of the raw hyperspectral image through a cross-representation learning method.
And then to learn semantically consistent representation over the created views by contrastive learning method.
arXiv Detail & Related papers (2022-03-14T11:07:33Z) - Dense Semantic Contrast for Self-Supervised Visual Representation
Learning [12.636783522731392]
We present Dense Semantic Contrast (DSC) for modeling semantic category decision boundaries at a dense level.
We propose a dense cross-image semantic contrastive learning framework for multi-granularity representation learning.
Experimental results show that our DSC model outperforms state-of-the-art methods when transferring to downstream dense prediction tasks.
arXiv Detail & Related papers (2021-09-16T07:04:05Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.