Learning Where to Learn in Cross-View Self-Supervised Learning
- URL: http://arxiv.org/abs/2203.14898v1
- Date: Mon, 28 Mar 2022 17:02:42 GMT
- Title: Learning Where to Learn in Cross-View Self-Supervised Learning
- Authors: Lang Huang, Shan You, Mingkai Zheng, Fei Wang, Chen Qian and Toshihiko
Yamasaki
- Abstract summary: Self-supervised learning (SSL) has made enormous progress and largely narrowed the gap with supervised ones.
Current methods simply adopt uniform aggregation of pixels for embedding.
We present a new approach, Learning Where to Learn (LEWEL), to adaptively aggregate spatial information of features.
- Score: 54.14989750044489
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning (SSL) has made enormous progress and largely
narrowed the gap with the supervised ones, where the representation learning is
mainly guided by a projection into an embedding space. During the projection,
current methods simply adopt uniform aggregation of pixels for embedding;
however, this risks involving object-irrelevant nuisances and spatial
misalignment for different augmentations. In this paper, we present a new
approach, Learning Where to Learn (LEWEL), to adaptively aggregate spatial
information of features, so that the projected embeddings could be exactly
aligned and thus guide the feature learning better. Concretely, we reinterpret
the projection head in SSL as a per-pixel projection and predict a set of
spatial alignment maps from the original features by this weight-sharing
projection head. A spectrum of aligned embeddings is thus obtained by
aggregating the features with spatial weighting according to these alignment
maps. As a result of this adaptive alignment, we observe substantial
improvements on both image-level prediction and dense prediction at the same
time: LEWEL improves MoCov2 by 1.6%/1.3%/0.5%/0.4% points, improves BYOL by
1.3%/1.3%/0.7%/0.6% points, on ImageNet linear/semi-supervised classification,
Pascal VOC semantic segmentation, and object detection, respectively.
Related papers
- Class-Imbalanced Semi-Supervised Learning for Large-Scale Point Cloud
Semantic Segmentation via Decoupling Optimization [64.36097398869774]
Semi-supervised learning (SSL) has been an active research topic for large-scale 3D scene understanding.
The existing SSL-based methods suffer from severe training bias due to class imbalance and long-tail distributions of the point cloud data.
We introduce a new decoupling optimization framework, which disentangles feature representation learning and classifier in an alternative optimization manner to shift the bias decision boundary effectively.
arXiv Detail & Related papers (2024-01-13T04:16:40Z) - Attention-Guided Lidar Segmentation and Odometry Using Image-to-Point Cloud Saliency Transfer [6.058427379240697]
SalLiDAR is a saliency-guided 3D semantic segmentation model that integrates saliency information to improve segmentation performance.
SalLONet is a self-supervised saliency-guided LiDAR odometry network that uses the semantic and saliency predictions of SalLiDAR to achieve better odometry estimation.
arXiv Detail & Related papers (2023-08-28T06:22:10Z) - Understanding Contrastive Learning Through the Lens of Margins [9.443122526245562]
Self-supervised learning, or SSL, holds the key to expanding the usage of machine learning in real-world tasks.
We use margins as a stepping stone for understanding how contrastive learning works at a deeper level.
arXiv Detail & Related papers (2023-06-20T13:28:27Z) - Spatiotemporal Self-supervised Learning for Point Clouds in the Wild [65.56679416475943]
We introduce an SSL strategy that leverages positive pairs in both the spatial and temporal domain.
We demonstrate the benefits of our approach via extensive experiments performed by self-supervised training on two large-scale LiDAR datasets.
arXiv Detail & Related papers (2023-03-28T18:06:22Z) - Understanding and Improving the Role of Projection Head in
Self-Supervised Learning [77.59320917894043]
Self-supervised learning (SSL) aims to produce useful feature representations without access to human-labeled data annotations.
Current contrastive learning approaches append a parametrized projection head to the end of some backbone network to optimize the InfoNCE objective.
This raises a fundamental question: Why is a learnable projection head required if we are to discard it after training?
arXiv Detail & Related papers (2022-12-22T05:42:54Z) - PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image
Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.
Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z) - Dense Contrastive Learning for Self-Supervised Visual Pre-Training [102.15325936477362]
We present dense contrastive learning, which implements self-supervised learning by optimizing a pairwise contrastive (dis)similarity loss at the pixel level between two views of input images.
Compared to the baseline method MoCo-v2, our method introduces negligible computation overhead (only 1% slower)
arXiv Detail & Related papers (2020-11-18T08:42:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.