GeneCIS: A Benchmark for General Conditional Image Similarity
- URL: http://arxiv.org/abs/2306.07969v1
- Date: Tue, 13 Jun 2023 17:59:58 GMT
- Title: GeneCIS: A Benchmark for General Conditional Image Similarity
- Authors: Sagar Vaze, Nicolas Carion, Ishan Misra
- Abstract summary: We argue that there are many notions of'similarity' and that models, like humans, should be able to adapt to these dynamically.
We propose the GeneCIS benchmark, which measures models' ability to adapt to a range of similarity conditions.
- Score: 21.96493413291777
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We argue that there are many notions of 'similarity' and that models, like
humans, should be able to adapt to these dynamically. This contrasts with most
representation learning methods, supervised or self-supervised, which learn a
fixed embedding function and hence implicitly assume a single notion of
similarity. For instance, models trained on ImageNet are biased towards object
categories, while a user might prefer the model to focus on colors, textures or
specific elements in the scene. In this paper, we propose the GeneCIS
('genesis') benchmark, which measures models' ability to adapt to a range of
similarity conditions. Extending prior work, our benchmark is designed for
zero-shot evaluation only, and hence considers an open-set of similarity
conditions. We find that baselines from powerful CLIP models struggle on
GeneCIS and that performance on the benchmark is only weakly correlated with
ImageNet accuracy, suggesting that simply scaling existing methods is not
fruitful. We further propose a simple, scalable solution based on automatically
mining information from existing image-caption datasets. We find our method
offers a substantial boost over the baselines on GeneCIS, and further improves
zero-shot performance on related image retrieval benchmarks. In fact, though
evaluated zero-shot, our model surpasses state-of-the-art supervised models on
MIT-States. Project page at https://sgvaze.github.io/genecis/.
Related papers
- GIM: Learning Generalizable Image Matcher From Internet Videos [18.974842517202365]
We propose GIM, a self-training framework for learning a single generalizable model based on any image matching architecture.
We also propose ZEB, the first zero-shot evaluation benchmark for image matching.
arXiv Detail & Related papers (2024-02-16T21:48:17Z) - Image Similarity using An Ensemble of Context-Sensitive Models [2.9490616593440317]
We present a more intuitive approach to build and compare image similarity models based on labelled data.
We address the challenges of sparse sampling in the image space (R, A, B) and biases in the models trained with context-based data.
Our testing results show that the ensemble model constructed performs 5% better than the best individual context-sensitive models.
arXiv Detail & Related papers (2024-01-15T20:23:05Z) - Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models [68.73086826874733]
We introduce a novel Referring Diffusional segmentor (Ref-Diff) for referring image segmentation.
We demonstrate that without a proposal generator, a generative model alone can achieve comparable performance to existing SOTA weakly-supervised models.
This indicates that generative models are also beneficial for this task and can complement discriminative models for better referring segmentation.
arXiv Detail & Related papers (2023-08-31T14:55:30Z) - CoNe: Contrast Your Neighbours for Supervised Image Classification [62.12074282211957]
Contrast Your Neighbours (CoNe) is a learning framework for supervised image classification.
CoNe employs the features of its similar neighbors as anchors to generate more adaptive and refined targets.
Our CoNe achieves 80.8% Top-1 accuracy on ImageNet with ResNet-50, which surpasses the recent Timm training recipe.
arXiv Detail & Related papers (2023-08-21T14:49:37Z) - Unicom: Universal and Compact Representation Learning for Image
Retrieval [65.96296089560421]
We cluster the large-scale LAION400M into one million pseudo classes based on the joint textual and visual features extracted by the CLIP model.
To alleviate such conflict, we randomly select partial inter-class prototypes to construct the margin-based softmax loss.
Our method significantly outperforms state-of-the-art unsupervised and supervised image retrieval approaches on multiple benchmarks.
arXiv Detail & Related papers (2023-04-12T14:25:52Z) - Identical Image Retrieval using Deep Learning [0.0]
We are using the BigTransfer Model, which is a state-of-art model itself.
We extract the key features and train on the K-Nearest Neighbor model to obtain the nearest neighbor.
The application of our model is to find similar images, which are hard to achieve through text queries within a low inference time.
arXiv Detail & Related papers (2022-05-10T13:34:41Z) - Rethinking Semantic Segmentation: A Prototype View [126.59244185849838]
We present a nonparametric semantic segmentation model based on non-learnable prototypes.
Our framework yields compelling results over several datasets.
We expect this work will provoke a rethink of the current de facto semantic segmentation model design.
arXiv Detail & Related papers (2022-03-28T21:15:32Z) - A Closer Look at Few-Shot Video Classification: A New Baseline and
Benchmark [33.86872697028233]
We present an in-depth study on few-shot video classification by making three contributions.
First, we perform a consistent comparative study on the existing metric-based methods to figure out their limitations in representation learning.
Second, we discover that there is a high correlation between the novel action class and the ImageNet object class, which is problematic in the few-shot recognition setting.
Third, we present a new benchmark with more base data to facilitate future few-shot video classification without pre-training.
arXiv Detail & Related papers (2021-10-24T06:01:46Z) - Conterfactual Generative Zero-Shot Semantic Segmentation [16.684570608930983]
One of the popular zero-shot semantic segmentation methods is based on the generative model.
In this work, we consider counterfactual methods to avoid the confounder in the original model.
Our model is compared with baseline models on two real-world datasets.
arXiv Detail & Related papers (2021-06-11T13:01:03Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - Diverse Image Generation via Self-Conditioned GANs [56.91974064348137]
We train a class-conditional GAN model without using manually annotated class labels.
Instead, our model is conditional on labels automatically derived from clustering in the discriminator's feature space.
Our clustering step automatically discovers diverse modes, and explicitly requires the generator to cover them.
arXiv Detail & Related papers (2020-06-18T17:56:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.