Exploring Set Similarity for Dense Self-supervised Representation
Learning
- URL: http://arxiv.org/abs/2107.08712v1
- Date: Mon, 19 Jul 2021 09:38:27 GMT
- Title: Exploring Set Similarity for Dense Self-supervised Representation
Learning
- Authors: Zhaoqing Wang, Qiang Li, Guoxin Zhang, Pengfei Wan, Wen Zheng, Nannan
Wang, Mingming Gong, Tongliang Liu
- Abstract summary: We propose to explore textbfset textbfsimilarity (SetSim) for dense self-supervised representation learning.
We generalize pixel-wise similarity learning to set-wise one to improve the robustness because sets contain more semantic and structure information.
Specifically, by resorting to attentional features of views, we establish corresponding sets, thus filtering out noisy backgrounds that may cause incorrect correspondences.
- Score: 96.35286140203407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: By considering the spatial correspondence, dense self-supervised
representation learning has achieved superior performance on various dense
prediction tasks. However, the pixel-level correspondence tends to be noisy
because of many similar misleading pixels, e.g., backgrounds. To address this
issue, in this paper, we propose to explore \textbf{set} \textbf{sim}ilarity
(SetSim) for dense self-supervised representation learning. We generalize
pixel-wise similarity learning to set-wise one to improve the robustness
because sets contain more semantic and structure information. Specifically, by
resorting to attentional features of views, we establish corresponding sets,
thus filtering out noisy backgrounds that may cause incorrect correspondences.
Meanwhile, these attentional features can keep the coherence of the same image
across different views to alleviate semantic inconsistency. We further search
the cross-view nearest neighbours of sets and employ the structured
neighbourhood information to enhance the robustness. Empirical evaluations
demonstrate that SetSim is superior to state-of-the-art methods on object
detection, keypoint detection, instance segmentation, and semantic
segmentation.
Related papers
- Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos [63.94040814459116]
Self-supervised methods have shown remarkable progress in learning high-level semantics and low-level temporal correspondence.
We propose a novel semantic-aware masked slot attention on top of the fused semantic features and correspondence maps.
We adopt semantic- and instance-level temporal consistency as self-supervision to encourage temporally coherent object-centric representations.
arXiv Detail & Related papers (2023-08-19T09:12:13Z) - Self-Supervised Visual Representation Learning with Semantic Grouping [50.14703605659837]
We tackle the problem of learning visual representations from unlabeled scene-centric data.
We propose contrastive learning from data-driven semantic slots, namely SlotCon, for joint semantic grouping and representation learning.
arXiv Detail & Related papers (2022-05-30T17:50:59Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - Dense Semantic Contrast for Self-Supervised Visual Representation
Learning [12.636783522731392]
We present Dense Semantic Contrast (DSC) for modeling semantic category decision boundaries at a dense level.
We propose a dense cross-image semantic contrastive learning framework for multi-granularity representation learning.
Experimental results show that our DSC model outperforms state-of-the-art methods when transferring to downstream dense prediction tasks.
arXiv Detail & Related papers (2021-09-16T07:04:05Z) - Unsupervised Learning of Dense Visual Representations [14.329781842154281]
We propose View-Agnostic Dense Representation (VADeR) for unsupervised learning of dense representations.
VADeR learns pixelwise representations by forcing local features to remain constant over different viewing conditions.
Our method outperforms ImageNet supervised pretraining in multiple dense prediction tasks.
arXiv Detail & Related papers (2020-11-11T01:28:11Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z) - Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning [86.45526827323954]
Weakly-supervised semantic segmentation is a challenging task as no pixel-wise label information is provided for training.
We propose an iterative algorithm to learn such pairwise relations.
We show that the proposed algorithm performs favorably against the state-of-the-art methods.
arXiv Detail & Related papers (2020-02-19T10:32:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.