Bag of Image Patch Embedding Behind the Success of Self-Supervised
Learning
- URL: http://arxiv.org/abs/2206.08954v2
- Date: Tue, 13 Jun 2023 00:48:40 GMT
- Title: Bag of Image Patch Embedding Behind the Success of Self-Supervised
Learning
- Authors: Yubei Chen, Adrien Bardes, Zengyi Li, Yann LeCun
- Abstract summary: This work shows that joint-embedding SSL approaches learn a representation of image patches, which reflects their co-occurrence.
We empirically show that learning a representation for fixed-scale patches and aggregating local patch representations as the image representation achieves similar or even better results than the baseline methods.
- Score: 12.480529556920974
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning (SSL) has recently achieved tremendous empirical
advancements in learning image representation. However, our understanding of
the principle behind learning such a representation is still limited. This work
shows that joint-embedding SSL approaches primarily learn a representation of
image patches, which reflects their co-occurrence. Such a connection to
co-occurrence modeling can be established formally, and it supplements the
prevailing invariance perspective. We empirically show that learning a
representation for fixed-scale patches and aggregating local patch
representations as the image representation achieves similar or even better
results than the baseline methods. We denote this process as BagSSL. Even with
32x32 patch representation, BagSSL achieves 62% top-1 linear probing accuracy
on ImageNet. On the other hand, with a multi-scale pretrained model, we show
that the whole image embedding is approximately the average of local patch
embeddings. While the SSL representation is relatively invariant at the global
scale, we show that locality is preserved when we zoom into local patch-level
representation. Further, we show that patch representation aggregation can
improve various SOTA baseline methods by a large margin. The patch
representation is considerably easier to understand, and this work makes a step
to demystify self-supervised representation learning.
Related papers
- Dense Self-Supervised Learning for Medical Image Segmentation [0.0]
We propose Pix2Rep, a self-supervised learning (SSL) approach for few-shot segmentation.
It reduces the manual annotation burden by learning powerful pixel-level representations directly from unlabeled images.
Results show improved performance compared to existing semi- and self-supervised approaches.
arXiv Detail & Related papers (2024-07-29T19:42:22Z) - CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - Learned representation-guided diffusion models for large-image generation [58.192263311786824]
We introduce a novel approach that trains diffusion models conditioned on embeddings from self-supervised learning (SSL)
Our diffusion models successfully project these features back to high-quality histopathology and remote sensing images.
Augmenting real data by generating variations of real images improves downstream accuracy for patch-level and larger, image-scale classification tasks.
arXiv Detail & Related papers (2023-12-12T14:45:45Z) - Patch-Wise Self-Supervised Visual Representation Learning: A Fine-Grained Approach [4.9204263448542465]
This study introduces an innovative, fine-grained dimension by integrating patch-level discrimination into self-supervised visual representation learning.
We employ a distinctive photometric patch-level augmentation, where each patch is individually augmented, independent from other patches within the same view.
We present a simple yet effective patch-matching algorithm to find the corresponding patches across the augmented views.
arXiv Detail & Related papers (2023-10-28T09:35:30Z) - Self-Supervised Pyramid Representation Learning for Multi-Label Visual
Analysis and Beyond [31.36818611460614]
We propose a Self-Supervised Pyramid Learning (SS-PRL) framework.
The proposed SS-PRL is designed to derive pyramid representations at patch levels via learning proper prototypes.
We show that, with our proposed SS-PRL for model pre-training, one can easily adapt and fine-tune the models for a variety of applications.
arXiv Detail & Related papers (2022-08-30T17:57:14Z) - HIRL: A General Framework for Hierarchical Image Representation Learning [54.12773508883117]
We propose a general framework for Hierarchical Image Representation Learning (HIRL)
This framework aims to learn multiple semantic representations for each image, and these representations are structured to encode image semantics from fine-grained to coarse-grained.
Based on a probabilistic factorization, HIRL learns the most fine-grained semantics by an off-the-shelf image SSL approach and learns multiple coarse-grained semantics by a novel semantic path discrimination scheme.
arXiv Detail & Related papers (2022-05-26T05:13:26Z) - Remote Sensing Images Semantic Segmentation with General Remote Sensing
Vision Model via a Self-Supervised Contrastive Learning Method [13.479068312825781]
We propose Global style and Local matching Contrastive Learning Network (GLCNet) for remote sensing semantic segmentation.
Specifically, the global style contrastive module is used to learn an image-level representation better.
The local features matching contrastive module is designed to learn representations of local regions which is beneficial for semantic segmentation.
arXiv Detail & Related papers (2021-06-20T03:03:40Z) - Isometric Propagation Network for Generalized Zero-shot Learning [72.02404519815663]
A popular strategy is to learn a mapping between the semantic space of class attributes and the visual space of images based on the seen classes and their data.
We propose Isometric propagation Network (IPN), which learns to strengthen the relation between classes within each space and align the class dependency in the two spaces.
IPN achieves state-of-the-art performance on three popular Zero-shot learning benchmarks.
arXiv Detail & Related papers (2021-02-03T12:45:38Z) - Seed the Views: Hierarchical Semantic Alignment for Contrastive
Representation Learning [116.91819311885166]
We propose a hierarchical semantic alignment strategy via expanding the views generated by a single image to textbfCross-samples and Multi-level representation.
Our method, termed as CsMl, has the ability to integrate multi-level visual representations across samples in a robust way.
arXiv Detail & Related papers (2020-12-04T17:26:24Z) - FeatMatch: Feature-Based Augmentation for Semi-Supervised Learning [64.32306537419498]
We propose a novel learned feature-based refinement and augmentation method that produces a varied set of complex transformations.
These transformations also use information from both within-class and across-class representations that we extract through clustering.
We demonstrate that our method is comparable to current state of art for smaller datasets while being able to scale up to larger datasets.
arXiv Detail & Related papers (2020-07-16T17:55:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.