Semantics-Consistent Feature Search for Self-Supervised Visual
Representation Learning
- URL: http://arxiv.org/abs/2212.06486v1
- Date: Tue, 13 Dec 2022 11:13:59 GMT
- Title: Semantics-Consistent Feature Search for Self-Supervised Visual
Representation Learning
- Authors: Kaiyou Song, Shan Zhang, Zihao An, Zimeng Luo, Tong Wang, Jin Xie
- Abstract summary: It is unavoidable to construct undesirable views containing different semantic concepts during the augmentation procedure.
It would damage the semantic consistency of representation to pull these augmentations closer in the feature space indiscriminately.
In this study, we introduce feature-level augmentation and propose a novel semantics-consistent feature search (SCFS) method to mitigate this negative effect.
- Score: 15.242064747740116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In contrastive self-supervised learning, the common way to learn
discriminative representation is to pull different augmented "views" of the
same image closer while pushing all other images further apart, which has been
proven to be effective. However, it is unavoidable to construct undesirable
views containing different semantic concepts during the augmentation procedure.
It would damage the semantic consistency of representation to pull these
augmentations closer in the feature space indiscriminately. In this study, we
introduce feature-level augmentation and propose a novel semantics-consistent
feature search (SCFS) method to mitigate this negative effect. The main idea of
SCFS is to adaptively search semantics-consistent features to enhance the
contrast between semantics-consistent regions in different augmentations. Thus,
the trained model can learn to focus on meaningful object regions, improving
the semantic representation ability. Extensive experiments conducted on
different datasets and tasks demonstrate that SCFS effectively improves the
performance of self-supervised learning and achieves state-of-the-art
performance on different downstream tasks.
Related papers
- Semi-supervised Semantic Segmentation for Remote Sensing Images via Multi-scale Uncertainty Consistency and Cross-Teacher-Student Attention [59.19580789952102]
This paper proposes a novel semi-supervised Multi-Scale Uncertainty and Cross-Teacher-Student Attention (MUCA) model for RS image semantic segmentation tasks.
MUCA constrains the consistency among feature maps at different layers of the network by introducing a multi-scale uncertainty consistency regularization.
MUCA utilizes a Cross-Teacher-Student attention mechanism to guide the student network, guiding the student network to construct more discriminative feature representations.
arXiv Detail & Related papers (2025-01-18T11:57:20Z) - PP-SSL : Priority-Perception Self-Supervised Learning for Fine-Grained Recognition [28.863121559446665]
Self-supervised learning is emerging in fine-grained visual recognition with promising results.
Existing self-supervised learning methods are susceptible to irrelevant patterns in self-supervised tasks.
We propose a novel Priority-Perception Self-Supervised Learning framework, denoted as PP-SSL.
arXiv Detail & Related papers (2024-11-28T15:47:41Z) - Feature Augmentation for Self-supervised Contrastive Learning: A Closer Look [28.350278251132078]
We propose a unified framework to conduct data augmentation in the feature space, known as feature augmentation.
This strategy is domain-agnostic, which augments similar features to the original ones and thus improves the data diversity.
arXiv Detail & Related papers (2024-10-16T09:25:11Z) - A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels.
We present a generative latent variable model for self-supervised learning.
We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z) - Focalized Contrastive View-invariant Learning for Self-supervised
Skeleton-based Action Recognition [16.412306012741354]
We propose a self-supervised framework called Focalized Contrastive View-invariant Learning (FoCoViL)
FoCoViL significantly suppresses the view-specific information on the representation space where the viewpoints are coarsely aligned.
It associates actions with common view-invariant properties and simultaneously separates the dissimilar ones.
arXiv Detail & Related papers (2023-04-03T10:12:30Z) - Learning Common Rationale to Improve Self-Supervised Representation for
Fine-Grained Visual Recognition Problems [61.11799513362704]
We propose learning an additional screening mechanism to identify discriminative clues commonly seen across instances and classes.
We show that a common rationale detector can be learned by simply exploiting the GradCAM induced from the SSL objective.
arXiv Detail & Related papers (2023-03-03T02:07:40Z) - Unsupervised Feature Clustering Improves Contrastive Representation
Learning for Medical Image Segmentation [18.75543045234889]
Self-supervised instance discrimination is an effective contrastive pretext task to learn feature representations and address limited medical image annotations.
We propose a new self-supervised contrastive learning method that uses unsupervised feature clustering to better select positive and negative image samples.
Our method outperforms state-of-the-art self-supervised contrastive techniques on these tasks.
arXiv Detail & Related papers (2022-11-15T22:54:29Z) - Weak Augmentation Guided Relational Self-Supervised Learning [80.0680103295137]
We introduce a novel relational self-supervised learning (ReSSL) framework that learns representations by modeling the relationship between different instances.
Our proposed method employs sharpened distribution of pairwise similarities among different instances as textitrelation metric.
Experimental results show that our proposed ReSSL substantially outperforms the state-of-the-art methods across different network architectures.
arXiv Detail & Related papers (2022-03-16T16:14:19Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z) - Heterogeneous Contrastive Learning: Encoding Spatial Information for
Compact Visual Representations [183.03278932562438]
This paper presents an effective approach that adds spatial information to the encoding stage to alleviate the learning inconsistency between the contrastive objective and strong data augmentation operations.
We show that our approach achieves higher efficiency in visual representations and thus delivers a key message to inspire the future research of self-supervised visual representation learning.
arXiv Detail & Related papers (2020-11-19T16:26:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.