Global-Local Self-Distillation for Visual Representation Learning
- URL: http://arxiv.org/abs/2207.14676v1
- Date: Fri, 29 Jul 2022 13:50:09 GMT
- Title: Global-Local Self-Distillation for Visual Representation Learning
- Authors: Tim Lebailly and Tinne Tuytelaars
- Abstract summary: Richer and more meaningful gradients updates are key to allow self-supervised methods to learn better and in a more efficient manner.
In a typical self-distillation framework, the representation of two augmented images are enforced to be coherent at the global level.
We propose to leverage the spatial information in the input images to obtain geometric matchings.
- Score: 41.24728444810133
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The downstream accuracy of self-supervised methods is tightly linked to the
proxy task solved during training and the quality of the gradients extracted
from it. Richer and more meaningful gradients updates are key to allow
self-supervised methods to learn better and in a more efficient manner. In a
typical self-distillation framework, the representation of two augmented images
are enforced to be coherent at the global level. Nonetheless, incorporating
local cues in the proxy task can be beneficial and improve the model accuracy
on downstream tasks. This leads to a dual objective in which, on the one hand,
coherence between global-representations is enforced and on the other,
coherence between local-representations is enforced. Unfortunately, an exact
correspondence mapping between two sets of local-representations does not exist
making the task of matching local-representations from one augmentation to
another non-trivial. We propose to leverage the spatial information in the
input images to obtain geometric matchings and compare this geometric approach
against previous methods based on similarity matchings. Our study shows that
not only 1) geometric matchings perform better than similarity based matchings
in low-data regimes but also 2) that similarity based matchings are highly
hurtful in low-data regimes compared to the vanilla baseline without local
self-distillation. The code will be released upon acceptance.
Related papers
- BCLNet: Bilateral Consensus Learning for Two-View Correspondence Pruning [26.400567961735234]
Correspondence pruning aims to establish reliable correspondences between two related images.
Existing approaches often employ a progressive strategy to handle the local and global contexts.
We propose a parallel context learning strategy that involves acquiring bilateral consensus for the two-view correspondence pruning task.
arXiv Detail & Related papers (2024-01-07T11:38:15Z) - How does Contrastive Learning Organize Images? [8.077578967149561]
Contrastive learning, a dominant self-supervised technique, emphasizes similarity in representations between augmentations of the same input and dissimilarity for different ones.
Recent studies challenge this direct relationship, spotlighting the crucial role of inductive biases.
We introduce the "RLD (Relative Local Density)" metric to capture this discrepancy.
arXiv Detail & Related papers (2023-05-17T14:10:54Z) - Data-efficient Large Scale Place Recognition with Graded Similarity
Supervision [10.117451511942267]
Visual place recognition (VPR) is a fundamental task of computer vision for visual localization.
Existing methods are trained using image pairs that either depict the same place or not.
We deploy an automatic re-annotation strategy to re-label VPR datasets.
We propose a new Generalized Contrastive Loss (GCL) that uses graded similarity labels for training contrastive networks.
arXiv Detail & Related papers (2023-03-21T10:56:57Z) - Adaptive Local-Component-aware Graph Convolutional Network for One-shot
Skeleton-based Action Recognition [54.23513799338309]
We present an Adaptive Local-Component-aware Graph Convolutional Network for skeleton-based action recognition.
Our method provides a stronger representation than the global embedding and helps our model reach state-of-the-art.
arXiv Detail & Related papers (2022-09-21T02:33:07Z) - Local2Global: A distributed approach for scaling representation learning
on graphs [10.254620252788776]
We propose a decentralised "local2global"' approach to graph representation learning, that one can a-priori use to scale any embedding technique.
We show that our approach achieves a good trade-off between scale and accuracy on edge reconstruction and semi-supervised classification.
We also consider the downstream task of anomaly detection and show how one can use local2global to highlight anomalies in cybersecurity networks.
arXiv Detail & Related papers (2022-01-12T23:00:22Z) - Rethinking Counting and Localization in Crowds:A Purely Point-Based
Framework [59.578339075658995]
We propose a purely point-based framework for joint crowd counting and individual localization.
We design an intuitive solution under this framework, which is called Point to Point Network (P2PNet)
arXiv Detail & Related papers (2021-07-27T11:41:50Z) - Progressive Bilateral-Context Driven Model for Post-Processing Person
Re-Identification [20.143803695488106]
We propose a lightweight post-processing person re-identification method in which the pairwise measure is determined by the relationship between the sample and the counterpart's context.
Experiments on four large-scale person re-identification benchmark datasets indicate that the proposed method can consistently achieve higher accuracies.
arXiv Detail & Related papers (2020-09-07T13:35:09Z) - Inter-Image Communication for Weakly Supervised Localization [77.2171924626778]
Weakly supervised localization aims at finding target object regions using only image-level supervision.
We propose to leverage pixel-level similarities across different objects for learning more accurate object locations.
Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set.
arXiv Detail & Related papers (2020-08-12T04:14:11Z) - Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences.
We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline.
Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv Detail & Related papers (2020-07-20T12:07:48Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.