Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning
- URL: http://arxiv.org/abs/2407.02014v1
- Date: Tue, 2 Jul 2024 07:35:21 GMT
- Title: Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning
- Authors: Chengchao Shen, Jianzhong Chen, Jianxin Wang,
- Abstract summary: We propose a novel Multi-Grained Contrast method (MGC) for unsupervised representation learning.
Specifically, we construct delicate multi-grained correspondences between positive views and then conduct multi-grained contrast by the correspondences to learn more general unsupervised representations.
Our method significantly outperforms the existing state-of-the-art methods on extensive downstream tasks, including object detection, instance segmentation, scene parsing, semantic segmentation and keypoint detection.
- Score: 10.630297877530614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The existing contrastive learning methods mainly focus on single-grained representation learning, e.g., part-level, object-level or scene-level ones, thus inevitably neglecting the transferability of representations on other granularity levels. In this paper, we aim to learn multi-grained representations, which can effectively describe the image on various granularity levels, thus improving generalization on extensive downstream tasks. To this end, we propose a novel Multi-Grained Contrast method (MGC) for unsupervised representation learning. Specifically, we construct delicate multi-grained correspondences between positive views and then conduct multi-grained contrast by the correspondences to learn more general unsupervised representations. Without pretrained on large-scale dataset, our method significantly outperforms the existing state-of-the-art methods on extensive downstream tasks, including object detection, instance segmentation, scene parsing, semantic segmentation and keypoint detection. Moreover, experimental results support the data-efficient property and excellent representation transferability of our method. The source code and trained weights are available at \url{https://github.com/visresearch/mgc}.
Related papers
- Point Contrastive Prediction with Semantic Clustering for
Self-Supervised Learning on Point Cloud Videos [71.20376514273367]
We propose a unified point cloud video self-supervised learning framework for object-centric and scene-centric data.
Our method outperforms supervised counterparts on a wide range of downstream tasks.
arXiv Detail & Related papers (2023-08-18T02:17:47Z) - Disentangling Multi-view Representations Beyond Inductive Bias [32.15900989696017]
We propose a novel multi-view representation disentangling method that ensures both interpretability and generalizability of the resulting representations.
Our experiments on four multi-view datasets demonstrate that our proposed method outperforms 12 comparison methods in terms of clustering and classification performance.
arXiv Detail & Related papers (2023-08-03T09:09:28Z) - A Clustering-guided Contrastive Fusion for Multi-view Representation
Learning [7.630965478083513]
We propose a deep fusion network to fuse view-specific representations into the view-common representation.
We also design an asymmetrical contrastive strategy that aligns the view-common representation and each view-specific representation.
In the incomplete view scenario, our proposed method resists noise interference better than those of our competitors.
arXiv Detail & Related papers (2022-12-28T07:21:05Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z) - Clustering by Maximizing Mutual Information Across Views [62.21716612888669]
We propose a novel framework for image clustering that incorporates joint representation learning and clustering.
Our method significantly outperforms state-of-the-art single-stage clustering methods across a variety of image datasets.
arXiv Detail & Related papers (2021-07-24T15:36:49Z) - Revisiting Contrastive Methods for Unsupervised Learning of Visual
Representations [78.12377360145078]
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection.
In this paper, we first study how biases in the dataset affect existing methods.
We show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets.
arXiv Detail & Related papers (2021-06-10T17:59:13Z) - Multimodal Contrastive Training for Visual Representation Learning [45.94662252627284]
We develop an approach to learning visual representations that embraces multimodal data.
Our method exploits intrinsic data properties within each modality and semantic information from cross-modal correlation simultaneously.
By including multimodal training in a unified framework, our method can learn more powerful and generic visual features.
arXiv Detail & Related papers (2021-04-26T19:23:36Z) - Heterogeneous Contrastive Learning: Encoding Spatial Information for
Compact Visual Representations [183.03278932562438]
This paper presents an effective approach that adds spatial information to the encoding stage to alleviate the learning inconsistency between the contrastive objective and strong data augmentation operations.
We show that our approach achieves higher efficiency in visual representations and thus delivers a key message to inspire the future research of self-supervised visual representation learning.
arXiv Detail & Related papers (2020-11-19T16:26:25Z) - Deep Partial Multi-View Learning [94.39367390062831]
We propose a novel framework termed Cross Partial Multi-View Networks (CPM-Nets)
We fifirst provide a formal defifinition of completeness and versatility for multi-view representation.
We then theoretically prove the versatility of the learned latent representations.
arXiv Detail & Related papers (2020-11-12T02:29:29Z) - Unsupervised Image Classification for Deep Representation Learning [42.09716669386924]
We propose an unsupervised image classification framework without using embedding clustering.
Experiments on ImageNet dataset have been conducted to prove the effectiveness of our method.
arXiv Detail & Related papers (2020-06-20T02:57:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.