ASIC: Aligning Sparse in-the-wild Image Collections
- URL: http://arxiv.org/abs/2303.16201v1
- Date: Tue, 28 Mar 2023 17:59:28 GMT
- Title: ASIC: Aligning Sparse in-the-wild Image Collections
- Authors: Kamal Gupta, Varun Jampani, Carlos Esteves, Abhinav Shrivastava,
Ameesh Makadia, Noah Snavely, Abhishek Kar
- Abstract summary: We present a method for joint alignment of sparse in-the-wild image collections of an object category.
We use pairwise nearest neighbors obtained from deep features of a pre-trained vision transformer (ViT) model as noisy and sparse keypoint matches.
Experiments on CUB and SPair-71k benchmarks demonstrate that our method can produce globally consistent and higher quality correspondences.
- Score: 86.66498558225625
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a method for joint alignment of sparse in-the-wild image
collections of an object category. Most prior works assume either ground-truth
keypoint annotations or a large dataset of images of a single object category.
However, neither of the above assumptions hold true for the long-tail of the
objects present in the world. We present a self-supervised technique that
directly optimizes on a sparse collection of images of a particular
object/object category to obtain consistent dense correspondences across the
collection. We use pairwise nearest neighbors obtained from deep features of a
pre-trained vision transformer (ViT) model as noisy and sparse keypoint matches
and make them dense and accurate matches by optimizing a neural network that
jointly maps the image collection into a learned canonical grid. Experiments on
CUB and SPair-71k benchmarks demonstrate that our method can produce globally
consistent and higher quality correspondences across the image collection when
compared to existing self-supervised methods. Code and other material will be
made available at \url{https://kampta.github.io/asic}.
Related papers
- CrIBo: Self-Supervised Learning via Cross-Image Object-Level
Bootstrapping [40.94237853380154]
We introduce a novel Cross-Image Object-Level Bootstrapping method tailored to enhance dense visual representation learning.
CrIBo emerges as a notably strong and adequate candidate for in-context learning, leveraging nearest neighbor retrieval at test time.
arXiv Detail & Related papers (2023-10-11T19:57:51Z) - Exploring the Limits of Deep Image Clustering using Pretrained Models [1.1060425537315088]
We present a methodology that learns to classify images without labels by leveraging pretrained feature extractors.
We propose a novel objective that learns associations between image features by introducing a variant of pointwise mutual information together with instance weighting.
arXiv Detail & Related papers (2023-03-31T08:56:29Z) - Neural Congealing: Aligning Images to a Joint Semantic Atlas [14.348512536556413]
We present a zero-shot self-supervised framework for aligning semantically-common content across a set of images.
Our approach harnesses the power of pre-trained DINO-ViT features to learn.
We show that our method performs favorably compared to a state-of-the-art method that requires extensive training on large-scale datasets.
arXiv Detail & Related papers (2023-02-08T09:26:22Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Rectifying the Shortcut Learning of Background: Shared Object
Concentration for Few-Shot Image Recognition [101.59989523028264]
Few-Shot image classification aims to utilize pretrained knowledge learned from a large-scale dataset to tackle a series of downstream classification tasks.
We propose COSOC, a novel Few-Shot Learning framework, to automatically figure out foreground objects at both pretraining and evaluation stage.
arXiv Detail & Related papers (2021-07-16T07:46:41Z) - COTR: Correspondence Transformer for Matching Across Images [31.995943755283786]
We propose a novel framework for finding correspondences in images based on a deep neural network.
By doing so, one has the option to query only the points of interest and retrieve sparse correspondences, or to query all points in an image and obtain dense mappings.
arXiv Detail & Related papers (2021-03-25T22:47:02Z) - Instance Localization for Self-supervised Detection Pretraining [68.24102560821623]
We propose a new self-supervised pretext task, called instance localization.
We show that integration of bounding boxes into pretraining promotes better task alignment and architecture alignment for transfer learning.
Experimental results demonstrate that our approach yields state-of-the-art transfer learning results for object detection.
arXiv Detail & Related papers (2021-02-16T17:58:57Z) - Inter-Image Communication for Weakly Supervised Localization [77.2171924626778]
Weakly supervised localization aims at finding target object regions using only image-level supervision.
We propose to leverage pixel-level similarities across different objects for learning more accurate object locations.
Our method achieves the Top-1 localization error rate of 45.17% on the ILSVRC validation set.
arXiv Detail & Related papers (2020-08-12T04:14:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.