InsCon:Instance Consistency Feature Representation via Self-Supervised
Learning
- URL: http://arxiv.org/abs/2203.07688v1
- Date: Tue, 15 Mar 2022 07:09:00 GMT
- Title: InsCon:Instance Consistency Feature Representation via Self-Supervised
Learning
- Authors: Junwei Yang, Ke Zhang, Zhaolin Cui, Jinming Su, Junfeng Luo, and
Xiaolin Wei
- Abstract summary: We propose a new end-to-end self-supervised framework called InsCon, which is devoted to capturing multi-instance information.
InsCon builds a targeted learning paradigm that applies multi-instance images as input, aligning the learned feature between corresponding instance views.
On the other hand, InsCon introduces the pull and push of cell-instance, which utilizes cell consistency to enhance fine-grained feature representation.
- Score: 9.416267640069297
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature representation via self-supervised learning has reached remarkable
success in image-level contrastive learning, which brings impressive
performances on image classification tasks. While image-level feature
representation mainly focuses on contrastive learning in single instance, it
ignores the objective differences between pretext and downstream prediction
tasks such as object detection and instance segmentation. In order to fully
unleash the power of feature representation on downstream prediction tasks, we
propose a new end-to-end self-supervised framework called InsCon, which is
devoted to capturing multi-instance information and extracting cell-instance
features for object recognition and localization. On the one hand, InsCon
builds a targeted learning paradigm that applies multi-instance images as
input, aligning the learned feature between corresponding instance views, which
makes it more appropriate for multi-instance recognition tasks. On the other
hand, InsCon introduces the pull and push of cell-instance, which utilizes cell
consistency to enhance fine-grained feature representation for precise boundary
localization. As a result, InsCon learns multi-instance consistency on semantic
feature representation and cell-instance consistency on spatial feature
representation. Experiments demonstrate the method we proposed surpasses MoCo
v2 by 1.1% AP^{bb} on COCO object detection and 1.0% AP^{mk} on COCO instance
segmentation using Mask R-CNN R50-FPN network structure with 90k iterations,
2.1% APbb on PASCAL VOC objection detection using Faster R-CNN R50-C4 network
structure with 24k iterations.
Related papers
- Masked Momentum Contrastive Learning for Zero-shot Semantic
Understanding [39.424931953675994]
Self-supervised pretraining (SSP) has emerged as a popular technique in machine learning, enabling the extraction of meaningful feature representations without labelled data.
This study endeavours to evaluate the effectiveness of pure self-supervised learning (SSL) techniques in computer vision tasks.
arXiv Detail & Related papers (2023-08-22T13:55:57Z) - Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos [63.94040814459116]
Self-supervised methods have shown remarkable progress in learning high-level semantics and low-level temporal correspondence.
We propose a novel semantic-aware masked slot attention on top of the fused semantic features and correspondence maps.
We adopt semantic- and instance-level temporal consistency as self-supervision to encourage temporally coherent object-centric representations.
arXiv Detail & Related papers (2023-08-19T09:12:13Z) - Robust Representation Learning by Clustering with Bisimulation Metrics
for Visual Reinforcement Learning with Distractions [9.088460902782547]
Clustering with Bisimulation Metrics (CBM) learns robust representations by grouping visual observations in the latent space.
CBM alternates between two steps: (1) grouping observations by measuring their bisimulation distances to the learned prototypes; (2) learning a set of prototypes according to the current cluster assignments.
Experiments demonstrate that CBM significantly improves the sample efficiency of popular visual RL algorithms.
arXiv Detail & Related papers (2023-02-12T13:27:34Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - CAD: Co-Adapting Discriminative Features for Improved Few-Shot
Classification [11.894289991529496]
Few-shot classification is a challenging problem that aims to learn a model that can adapt to unseen classes given a few labeled samples.
Recent approaches pre-train a feature extractor, and then fine-tune for episodic meta-learning.
We propose a strategy to cross-attend and re-weight discriminative features for few-shot classification.
arXiv Detail & Related papers (2022-03-25T06:14:51Z) - UniVIP: A Unified Framework for Self-Supervised Visual Pre-training [50.87603616476038]
We propose a novel self-supervised framework to learn versatile visual representations on either single-centric-object or non-iconic dataset.
Massive experiments show that UniVIP pre-trained on non-iconic COCO achieves state-of-the-art transfer performance.
Our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2.5% with the same pre-training epochs in linear probing.
arXiv Detail & Related papers (2022-03-14T10:04:04Z) - Distribution Alignment: A Unified Framework for Long-tail Visual
Recognition [52.36728157779307]
We propose a unified distribution alignment strategy for long-tail visual recognition.
We then introduce a generalized re-weight method in the two-stage learning to balance the class prior.
Our approach achieves the state-of-the-art results across all four recognition tasks with a simple and unified framework.
arXiv Detail & Related papers (2021-03-30T14:09:53Z) - Information Maximization Clustering via Multi-View Self-Labelling [9.947717243638289]
We propose a novel single-phase clustering method that simultaneously learns meaningful representations and assigns the corresponding annotations.
This is achieved by integrating a discrete representation into the self-supervised paradigm through a net.
Our empirical results show that the proposed framework outperforms state-of-the-art techniques with the average accuracy of 89.1% and 49.0%, respectively.
arXiv Detail & Related papers (2021-03-12T16:04:41Z) - Unsupervised Pretraining for Object Detection by Patch Reidentification [72.75287435882798]
Unsupervised representation learning achieves promising performances in pre-training representations for object detectors.
This work proposes a simple yet effective representation learning method for object detection, named patch re-identification (Re-ID)
Our method significantly outperforms its counterparts on COCO in all settings, such as different training iterations and data percentages.
arXiv Detail & Related papers (2021-03-08T15:13:59Z) - Dense Contrastive Learning for Self-Supervised Visual Pre-Training [102.15325936477362]
We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images.
Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only 1% slower)
arXiv Detail & Related papers (2020-11-18T08:42:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.