Semantic-Aware Fine-Grained Correspondence
- URL: http://arxiv.org/abs/2207.10456v2
- Date: Fri, 22 Jul 2022 09:41:46 GMT
- Title: Semantic-Aware Fine-Grained Correspondence
- Authors: Yingdong Hu, Renhao Wang, Kaifeng Zhang, Yang Gao
- Abstract summary: We propose to learn semantic-aware fine-grained correspondence using image-level self-supervised methods.
We design a pixel-level self-supervised learning objective which specifically targets fine-grained correspondence.
Our method surpasses previous state-of-the-art self-supervised methods using convolutional networks on a variety of visual correspondence tasks.
- Score: 8.29030327276322
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Establishing visual correspondence across images is a challenging and
essential task. Recently, an influx of self-supervised methods have been
proposed to better learn representations for visual correspondence. However, we
find that these methods often fail to leverage semantic information and
over-rely on the matching of low-level features. In contrast, human vision is
capable of distinguishing between distinct objects as a pretext to tracking.
Inspired by this paradigm, we propose to learn semantic-aware fine-grained
correspondence. Firstly, we demonstrate that semantic correspondence is
implicitly available through a rich set of image-level self-supervised methods.
We further design a pixel-level self-supervised learning objective which
specifically targets fine-grained correspondence. For downstream tasks, we fuse
these two kinds of complementary correspondence representations together,
demonstrating that they boost performance synergistically. Our method surpasses
previous state-of-the-art self-supervised methods using convolutional networks
on a variety of visual correspondence tasks, including video object
segmentation, human pose tracking, and human part tracking.
Related papers
- Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - A semantics-driven methodology for high-quality image annotation [4.7590051176368915]
We propose vTelos, an integrated Natural Language Processing, Knowledge Representation, and Computer Vision methodology.
Key element of vTelos is the exploitation of the WordNet lexico-semantic hierarchy as the main means for providing the meaning of natural language labels.
The methodology is validated on images populating a subset of the ImageNet hierarchy.
arXiv Detail & Related papers (2023-07-26T11:38:45Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - Building a visual semantics aware object hierarchy [0.0]
We propose a novel unsupervised method to build visual semantics aware object hierarchy.
Our intuition in this paper comes from real-world knowledge representation where concepts are hierarchically organized.
The evaluation consists of two parts, firstly we apply the constructed hierarchy on the object recognition task and then we compare our visual hierarchy and existing lexical hierarchies to show the validity of our method.
arXiv Detail & Related papers (2022-02-26T00:10:21Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Can Semantic Labels Assist Self-Supervised Visual Representation
Learning? [194.1681088693248]
We present a new algorithm named Supervised Contrastive Adjustment in Neighborhood (SCAN)
In a series of downstream tasks, SCAN achieves superior performance compared to previous fully-supervised and self-supervised methods.
Our study reveals that semantic labels are useful in assisting self-supervised methods, opening a new direction for the community.
arXiv Detail & Related papers (2020-11-17T13:25:00Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z) - Learning Representations by Predicting Bags of Visual Words [55.332200948110895]
Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data.
Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions.
arXiv Detail & Related papers (2020-02-27T16:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.