Related papers: VICRegL: Self-Supervised Learning of Local Visual Features

VICRegL: Self-Supervised Learning of Local Visual Features

URL: http://arxiv.org/abs/2210.01571v1
Date: Tue, 4 Oct 2022 12:54:25 GMT
Title: VICRegL: Self-Supervised Learning of Local Visual Features
Authors: Adrien Bardes and Jean Ponce and Yann LeCun
Abstract summary: This paper explores the fundamental trade-off between learning local and global features. A new method called VICRegL is proposed that learns good global and local features simultaneously. We demonstrate strong performance on linear classification and segmentation transfer tasks.
Score: 34.92750644059916
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Most recent self-supervised methods for learning image representations focus on either producing a global feature with invariance properties, or producing a set of local features. The former works best for classification tasks while the latter is best for detection and segmentation tasks. This paper explores the fundamental trade-off between learning local and global features. A new method called VICRegL is proposed that learns good global and local features simultaneously, yielding excellent performance on detection and segmentation tasks while maintaining good performance on classification tasks. Concretely, two identical branches of a standard convolutional net architecture are fed two differently distorted versions of the same image. The VICReg criterion is applied to pairs of global feature vectors. Simultaneously, the VICReg criterion is applied to pairs of local feature vectors occurring before the last pooling layer. Two local feature vectors are attracted to each other if their l2-distance is below a threshold or if their relative locations are consistent with a known geometric transformation between the two input images. We demonstrate strong performance on linear classification and segmentation transfer tasks. Code and pretrained models are publicly available at: https://github.com/facebookresearch/VICRegL

Related papers

Siamese Transformer Networks for Few-shot Image Classification [9.55588609556447]
Humans exhibit remarkable proficiency in visual classification tasks, accurately recognizing and classifying new images with minimal examples. Existing few-shot image classification methods often emphasize either global features or local features, with few studies considering the integration of both. We propose a novel approach based on the Siamese Transformer Network (STN) Our strategy effectively harnesses the potential of global and local features in few-shot image classification, circumventing the need for complex feature adaptation modules.
arXiv Detail & Related papers (2024-07-16T14:27:23Z)
AANet: Aggregation and Alignment Network with Semi-hard Positive Sample Mining for Hierarchical Place Recognition [48.043749855085025]
Visual place recognition (VPR) is one of the research hotspots in robotics, which uses visual information to locate robots. We present a unified network capable of extracting global features for retrieving candidates via an aggregation module. We also propose a Semi-hard Positive Sample Mining (ShPSM) strategy to select appropriate hard positive images for training more robust VPR networks.
arXiv Detail & Related papers (2023-10-08T14:46:11Z)
High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation [17.804090651425955]
Image-level weakly-supervised segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training. Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss. We reformulate both techniques based on binomial posteriors of multiple independent binary problems. This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method.
arXiv Detail & Related papers (2023-04-05T17:43:57Z)
Learning Implicit Feature Alignment Function for Semantic Segmentation [51.36809814890326]
Implicit Feature Alignment function (IFA) is inspired by the rapidly expanding topic of implicit neural representations. We show that IFA implicitly aligns the feature maps at different levels and is capable of producing segmentation maps in arbitrary resolutions. Our method can be combined with improvement on various architectures, and it achieves state-of-the-art accuracy trade-off on common benchmarks.
arXiv Detail & Related papers (2022-06-17T09:40:14Z)
A Hierarchical Dual Model of Environment- and Place-Specific Utility for Visual Place Recognition [26.845945347572446]
We present a novel approach to deduce two key types of utility for Visual Place Recognition (VPR) We employ contrastive learning principles to estimate both the environment- and place-specific utility of Vector of Locally Aggregated Descriptors (VLAD) clusters. By combining these two utility measures, our approach achieves state-of-the-art performance on three challenging benchmark datasets.
arXiv Detail & Related papers (2021-07-06T07:38:47Z)
Conformer: Local Features Coupling Global Representations for Visual Recognition [72.9550481476101]
We propose a hybrid network structure, termed Conformer, to take advantage of convolutional operations and self-attention mechanisms for enhanced representation learning. Experiments show that Conformer, under the comparable parameter complexity, outperforms the visual transformer (DeiT-B) by 2.3% on ImageNet.
arXiv Detail & Related papers (2021-05-09T10:00:03Z)
Region Similarity Representation Learning [94.88055458257081]
Region Similarity Representation Learning (ReSim) is a new approach to self-supervised representation learning for localization-based tasks. ReSim learns both regional representations for localization as well as semantic image-level representations. We show how ReSim learns representations which significantly improve the localization and classification performance compared to a competitive MoCo-v2 baseline.
arXiv Detail & Related papers (2021-03-24T00:42:37Z)
Inter-Image Communication for Weakly Supervised Localization [77.2171924626778]
Weakly supervised localization aims at finding target object regions using only image-level supervision. We propose to leverage pixel-level similarities across different objects for learning more accurate object locations. Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set.
arXiv Detail & Related papers (2020-08-12T04:14:11Z)
Fine-Grained Visual Classification with Efficient End-to-end Localization [49.9887676289364]
We present an efficient localization module that can be fused with a classification network in an end-to-end setup. We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft.
arXiv Detail & Related papers (2020-05-11T14:07:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.