CrIBo: Self-Supervised Learning via Cross-Image Object-Level
Bootstrapping
- URL: http://arxiv.org/abs/2310.07855v2
- Date: Sun, 3 Mar 2024 09:57:57 GMT
- Title: CrIBo: Self-Supervised Learning via Cross-Image Object-Level
Bootstrapping
- Authors: Tim Lebailly, Thomas Stegm\"uller, Behzad Bozorgtabar, Jean-Philippe
Thiran, Tinne Tuytelaars
- Abstract summary: We introduce a novel Cross-Image Object-Level Bootstrapping method tailored to enhance dense visual representation learning.
CrIBo emerges as a notably strong and adequate candidate for in-context learning, leveraging nearest neighbor retrieval at test time.
- Score: 40.94237853380154
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Leveraging nearest neighbor retrieval for self-supervised representation
learning has proven beneficial with object-centric images. However, this
approach faces limitations when applied to scene-centric datasets, where
multiple objects within an image are only implicitly captured in the global
representation. Such global bootstrapping can lead to undesirable entanglement
of object representations. Furthermore, even object-centric datasets stand to
benefit from a finer-grained bootstrapping approach. In response to these
challenges, we introduce a novel Cross-Image Object-Level Bootstrapping method
tailored to enhance dense visual representation learning. By employing
object-level nearest neighbor bootstrapping throughout the training, CrIBo
emerges as a notably strong and adequate candidate for in-context learning,
leveraging nearest neighbor retrieval at test time. CrIBo shows
state-of-the-art performance on the latter task while being highly competitive
in more standard downstream segmentation tasks. Our code and pretrained models
are publicly available at https://github.com/tileb1/CrIBo.
Related papers
- Co-Segmentation without any Pixel-level Supervision with Application to Large-Scale Sketch Classification [3.3104978705632777]
We propose a novel method for object co-segmentation, i.e. pixel-level localization of a common object in a set of images.
The method achieves state-of-the-art performance among methods trained with the same level of supervision.
The benefits of the proposed co-segmentation method are further demonstrated in the task of large-scale sketch recognition.
arXiv Detail & Related papers (2024-10-17T14:16:45Z) - PEEKABOO: Hiding parts of an image for unsupervised object localization [7.161489957025654]
Localizing objects in an unsupervised manner poses significant challenges due to the absence of key visual information.
We propose a single-stage learning framework, dubbed PEEKABOO, for unsupervised object localization.
The key idea is to selectively hide parts of an image and leverage the remaining image information to infer the location of objects without explicit supervision.
arXiv Detail & Related papers (2024-07-24T20:35:20Z) - Open-Vocabulary Camouflaged Object Segmentation [66.94945066779988]
We introduce a new task, open-vocabulary camouflaged object segmentation (OVCOS)
We construct a large-scale complex scene dataset (textbfOVCamo) containing 11,483 hand-selected images with fine annotations and corresponding object classes.
By integrating the guidance of class semantic knowledge and the supplement of visual structure cues from the edge and depth information, the proposed method can efficiently capture camouflaged objects.
arXiv Detail & Related papers (2023-11-19T06:00:39Z) - Coarse-to-Fine: Learning Compact Discriminative Representation for
Single-Stage Image Retrieval [11.696941841000985]
Two-stage methods following retrieve-and-rerank paradigm have achieved excellent performance, but their separate local and global modules are inefficient to real-world applications.
We propose a mechanism which attentively selects prominent local descriptors and infuse fine-grained semantic relations into the global representation.
Our method achieves state-of-the-art single-stage image retrieval performance on benchmarks such as Revisited Oxford and Revisited Paris.
arXiv Detail & Related papers (2023-08-08T03:06:10Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - Shepherding Slots to Objects: Towards Stable and Robust Object-Centric
Learning [28.368429312400885]
Single-view images carry less information about how to disentangle a given scene than videos or multi-view images do.
We introduce a novel OCL framework for single-view images, SLot Attention via SHepherding (SLASH), which consists of two simple-yet-effective modules on top of Slot Attention.
Our proposed method enables consistent learning of object-centric representation and achieves strong performance across four datasets.
arXiv Detail & Related papers (2023-03-31T07:07:29Z) - ASIC: Aligning Sparse in-the-wild Image Collections [86.66498558225625]
We present a method for joint alignment of sparse in-the-wild image collections of an object category.
We use pairwise nearest neighbors obtained from deep features of a pre-trained vision transformer (ViT) model as noisy and sparse keypoint matches.
Experiments on CUB and SPair-71k benchmarks demonstrate that our method can produce globally consistent and higher quality correspondences.
arXiv Detail & Related papers (2023-03-28T17:59:28Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - Unsupervised Object-Level Representation Learning from Scene Images [97.07686358706397]
Object-level Representation Learning (ORL) is a new self-supervised learning framework towards scene images.
Our key insight is to leverage image-level self-supervised pre-training as the prior to discover object-level semantic correspondence.
ORL significantly improves the performance of self-supervised learning on scene images, even surpassing supervised ImageNet pre-training on several downstream tasks.
arXiv Detail & Related papers (2021-06-22T17:51:24Z) - Unsupervised Image Classification for Deep Representation Learning [42.09716669386924]
We propose an unsupervised image classification framework without using embedding clustering.
Experiments on ImageNet dataset have been conducted to prove the effectiveness of our method.
arXiv Detail & Related papers (2020-06-20T02:57:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.