Learning an Adaptation Function to Assess Image Visual Similarities
- URL: http://arxiv.org/abs/2206.01417v1
- Date: Fri, 3 Jun 2022 07:15:00 GMT
- Title: Learning an Adaptation Function to Assess Image Visual Similarities
- Authors: Olivier Risser-Maroix (LIPADE), Amine Marzouki (LIPADE), Hala Djeghim
(LIPADE), Camille Kurtz (LIPADE), Nicolas Lomenie (LIPADE)
- Abstract summary: We focus here on the specific task of learning visual image similarities when analogy matters.
We propose to compare different supervised, semi-supervised and self-supervised networks, pre-trained on distinct scales and contents datasets.
Our experiments conducted on the Totally Looks Like image dataset highlight the interest of our method, by increasing the retrieval scores of the best model @1 by 2.25x.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human perception is routinely assessing the similarity between images, both
for decision making and creative thinking. But the underlying cognitive process
is not really well understood yet, hence difficult to be mimicked by computer
vision systems. State-of-the-art approaches using deep architectures are often
based on the comparison of images described as feature vectors learned for
image categorization task. As a consequence, such features are powerful to
compare semantically related images but not really efficient to compare images
visually similar but semantically unrelated. Inspired by previous works on
neural features adaptation to psycho-cognitive representations, we focus here
on the specific task of learning visual image similarities when analogy
matters. We propose to compare different supervised, semi-supervised and
self-supervised networks, pre-trained on distinct scales and contents datasets
(such as ImageNet-21k, ImageNet-1K or VGGFace2) to conclude which model may be
the best to approximate the visual cortex and learn only an adaptation function
corresponding to the approximation of the the primate IT cortex through the
metric learning framework. Our experiments conducted on the Totally Looks Like
image dataset highlight the interest of our method, by increasing the retrieval
scores of the best model @1 by 2.25x. This research work was recently accepted
for publication at the ICIP 2021 international conference [1]. In this new
article, we expand on this previous work by using and comparing new pre-trained
feature extractors on other datasets.
Related papers
- CorrEmbed: Evaluating Pre-trained Model Image Similarity Efficacy with a
Novel Metric [6.904776368895614]
We evaluate the viability of the image embeddings from pre-trained computer vision models using a novel approach named CorrEmbed.
Our approach computes the correlation between distances in image embeddings and distances in human-generated tag vectors.
Our method also identifies deviations from this pattern, providing insights into how different models capture high-level image features.
arXiv Detail & Related papers (2023-08-30T16:23:07Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner.
Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z) - Mix-up Self-Supervised Learning for Contrast-agnostic Applications [33.807005669824136]
We present the first mix-up self-supervised learning framework for contrast-agnostic applications.
We address the low variance across images based on cross-domain mix-up and build the pretext task based on image reconstruction and transparency prediction.
arXiv Detail & Related papers (2022-04-02T16:58:36Z) - Contextual Similarity Aggregation with Self-attention for Visual
Re-ranking [96.55393026011811]
We propose a visual re-ranking method by contextual similarity aggregation with self-attention.
We conduct comprehensive experiments on four benchmark datasets to demonstrate the generality and effectiveness of our proposed visual re-ranking method.
arXiv Detail & Related papers (2021-10-26T06:20:31Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Kinship Verification Based on Cross-Generation Feature Interaction
Learning [53.62256887837659]
Kinship verification from facial images has been recognized as an emerging yet challenging technique in computer vision applications.
We propose a novel cross-generation feature interaction learning (CFIL) framework for robust kinship verification.
arXiv Detail & Related papers (2021-09-07T01:50:50Z) - Exploiting the relationship between visual and textual features in
social networks for image classification with zero-shot deep learning [0.0]
In this work, we propose a classifier ensemble based on the transferable learning capabilities of the CLIP neural network architecture.
Our experiments, based on image classification tasks according to the labels of the Places dataset, are performed by first considering only the visual part.
Considering the associated texts to the images can help to improve the accuracy depending on the goal.
arXiv Detail & Related papers (2021-07-08T10:54:59Z) - AugNet: End-to-End Unsupervised Visual Representation Learning with
Image Augmentation [3.6790362352712873]
We propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures.
Our experiments demonstrate that the method is able to represent the image in low dimensional space.
Unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets.
arXiv Detail & Related papers (2021-06-11T09:02:30Z) - Learning Representations by Predicting Bags of Visual Words [55.332200948110895]
Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data.
Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially dense image descriptions.
arXiv Detail & Related papers (2020-02-27T16:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.