Dense Semantic Contrast for Self-Supervised Visual Representation
Learning
- URL: http://arxiv.org/abs/2109.07756v1
- Date: Thu, 16 Sep 2021 07:04:05 GMT
- Title: Dense Semantic Contrast for Self-Supervised Visual Representation
Learning
- Authors: Xiaoni Li, Yu Zhou, Yifei Zhang, Aoting Zhang, Wei Wang, Ning Jiang,
Haiying Wu, Weiping Wang
- Abstract summary: We present Dense Semantic Contrast (DSC) for modeling semantic category decision boundaries at a dense level.
We propose a dense cross-image semantic contrastive learning framework for multi-granularity representation learning.
Experimental results show that our DSC model outperforms state-of-the-art methods when transferring to downstream dense prediction tasks.
- Score: 12.636783522731392
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised representation learning for visual pre-training has achieved
remarkable success with sample (instance or pixel) discrimination and semantics
discovery of instance, whereas there still exists a non-negligible gap between
pre-trained model and downstream dense prediction tasks. Concretely, these
downstream tasks require more accurate representation, in other words, the
pixels from the same object must belong to a shared semantic category, which is
lacking in the previous methods. In this work, we present Dense Semantic
Contrast (DSC) for modeling semantic category decision boundaries at a dense
level to meet the requirement of these tasks. Furthermore, we propose a dense
cross-image semantic contrastive learning framework for multi-granularity
representation learning. Specially, we explicitly explore the semantic
structure of the dataset by mining relations among pixels from different
perspectives. For intra-image relation modeling, we discover pixel neighbors
from multiple views. And for inter-image relations, we enforce pixel
representation from the same semantic class to be more similar than the
representation from different classes in one mini-batch. Experimental results
show that our DSC model outperforms state-of-the-art methods when transferring
to downstream dense prediction tasks, including object detection, semantic
segmentation, and instance segmentation. Code will be made available.
Related papers
- Self-supervised Pre-training for Semantic Segmentation in an Indoor
Scene [8.357801312689622]
We propose RegConsist, a method for self-supervised pre-training of a semantic segmentation model.
We use a variant of contrastive learning to train a DCNN model for predicting semantic segmentation from RGB views in the target environment.
The proposed method outperforms models pre-trained on ImageNet and achieves competitive performance when using models that are trained for exactly the same task but on a different dataset.
arXiv Detail & Related papers (2022-10-04T20:10:14Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - Self-supervised Contrastive Learning for Cross-domain Hyperspectral
Image Representation [26.610588734000316]
This paper introduces a self-supervised learning framework suitable for hyperspectral images that are inherently challenging to annotate.
The proposed framework architecture leverages cross-domain CNN, allowing for learning representations from different hyperspectral images.
The experimental results demonstrate the advantage of the proposed self-supervised representation over models trained from scratch or other transfer learning methods.
arXiv Detail & Related papers (2022-02-08T16:16:45Z) - HCSC: Hierarchical Contrastive Selective Coding [44.655310210531226]
Hierarchical Contrastive Selective Coding (HCSC) is a novel contrastive learning framework.
We introduce an elaborate pair selection scheme to make image representations better fit semantic structures.
We verify the superior performance of HCSC over state-of-the-art contrastive methods.
arXiv Detail & Related papers (2022-02-01T15:04:40Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Exploring Set Similarity for Dense Self-supervised Representation
Learning [96.35286140203407]
We propose to explore textbfset textbfsimilarity (SetSim) for dense self-supervised representation learning.
We generalize pixel-wise similarity learning to set-wise one to improve the robustness because sets contain more semantic and structure information.
Specifically, by resorting to attentional features of views, we establish corresponding sets, thus filtering out noisy backgrounds that may cause incorrect correspondences.
arXiv Detail & Related papers (2021-07-19T09:38:27Z) - Exploring Cross-Image Pixel Contrast for Semantic Segmentation [130.22216825377618]
We propose a pixel-wise contrastive framework for semantic segmentation in the fully supervised setting.
The core idea is to enforce pixel embeddings belonging to a same semantic class to be more similar than embeddings from different classes.
Our method can be effortlessly incorporated into existing segmentation frameworks without extra overhead during testing.
arXiv Detail & Related papers (2021-01-28T11:35:32Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - Learning Representations by Predicting Bags of Visual Words [55.332200948110895]
Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data.
Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions.
arXiv Detail & Related papers (2020-02-27T16:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.