VASR: Visual Analogies of Situation Recognition
- URL: http://arxiv.org/abs/2212.04542v1
- Date: Thu, 8 Dec 2022 20:08:49 GMT
- Title: VASR: Visual Analogies of Situation Recognition
- Authors: Yonatan Bitton, Ron Yosef, Eli Strugo, Dafna Shahaf, Roy Schwartz,
Gabriel Stanovsky
- Abstract summary: We introduce a novel task, Visual Analogies of Situation Recognition.
We tackle complex analogies requiring understanding of scenes.
Crowdsourced annotations for a sample of the data indicate that humans agree with the dataset label 80% of the time.
Our experiments demonstrate that state-of-the-art models do well when distractors are chosen randomly, but struggle with carefully chosen distractors.
- Score: 21.114629154550364
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A core process in human cognition is analogical mapping: the ability to
identify a similar relational structure between different situations. We
introduce a novel task, Visual Analogies of Situation Recognition, adapting the
classical word-analogy task into the visual domain. Given a triplet of images,
the task is to select an image candidate B' that completes the analogy (A to A'
is like B to what?). Unlike previous work on visual analogy that focused on
simple image transformations, we tackle complex analogies requiring
understanding of scenes.
We leverage situation recognition annotations and the CLIP model to generate
a large set of 500k candidate analogies. Crowdsourced annotations for a sample
of the data indicate that humans agree with the dataset label ~80% of the time
(chance level 25%). Furthermore, we use human annotations to create a
gold-standard dataset of 3,820 validated analogies. Our experiments demonstrate
that state-of-the-art models do well when distractors are chosen randomly
(~86%), but struggle with carefully chosen distractors (~53%, compared to 90%
human accuracy). We hope our dataset will encourage the development of new
analogy-making models. Website: https://vasr-dataset.github.io/
Related papers
- Evaluating Multiview Object Consistency in Humans and Image Models [68.36073530804296]
We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape.
We collect 35K trials of behavioral data from over 500 participants.
We then evaluate the performance of common vision models.
arXiv Detail & Related papers (2024-09-09T17:59:13Z) - Image Similarity using An Ensemble of Context-Sensitive Models [2.9490616593440317]
We present a more intuitive approach to build and compare image similarity models based on labelled data.
We address the challenges of sparse sampling in the image space (R, A, B) and biases in the models trained with context-based data.
Our testing results show that the ensemble model constructed performs 5% better than the best individual context-sensitive models.
arXiv Detail & Related papers (2024-01-15T20:23:05Z) - SynCDR : Training Cross Domain Retrieval Models with Synthetic Data [69.26882668598587]
In cross-domain retrieval, a model is required to identify images from the same semantic category across two visual domains.
We show how to generate synthetic data to fill in these missing category examples across domains.
Our best SynCDR model can outperform prior art by up to 15%.
arXiv Detail & Related papers (2023-12-31T08:06:53Z) - FAME: Flexible, Scalable Analogy Mappings Engine [22.464249291871937]
In this work, we relax the input requirements, requiring only names of entities to be mapped.
We automatically extract commonsense representations and use them to identify a mapping between the entities.
Our framework can handle partial analogies and suggest new entities to be added.
arXiv Detail & Related papers (2023-11-03T12:08:02Z) - StoryAnalogy: Deriving Story-level Analogies from Large Language Models
to Unlock Analogical Understanding [72.38872974837462]
We evaluate the ability to identify and generate analogies by constructing a first-of-its-kind large-scale story-level analogy corpus.
textscStory Analogy contains 24K story pairs from diverse domains with human annotations on two similarities from the extended Structure-Mapping Theory.
We observe that the data in textscStory Analogy can improve the quality of analogy generation in large language models.
arXiv Detail & Related papers (2023-10-19T16:29:23Z) - Correlational Image Modeling for Self-Supervised Visual Pre-Training [81.82907503764775]
Correlational Image Modeling is a novel and surprisingly effective approach to self-supervised visual pre-training.
Three key designs enable correlational image modeling as a nontrivial and meaningful self-supervisory task.
arXiv Detail & Related papers (2023-03-22T15:48:23Z) - Life is a Circus and We are the Clowns: Automatically Finding Analogies
between Situations and Processes [12.8252101640812]
Much research has suggested that analogies are key to non-brittle systems that can adapt to new domains.
Despite their importance, analogies received little attention in the NLP community.
arXiv Detail & Related papers (2022-10-21T18:54:17Z) - Learning an Adaptation Function to Assess Image Visual Similarities [0.0]
We focus here on the specific task of learning visual image similarities when analogy matters.
We propose to compare different supervised, semi-supervised and self-supervised networks, pre-trained on distinct scales and contents datasets.
Our experiments conducted on the Totally Looks Like image dataset highlight the interest of our method, by increasing the retrieval scores of the best model @1 by 2.25x.
arXiv Detail & Related papers (2022-06-03T07:15:00Z) - A Comprehensive Study of Image Classification Model Sensitivity to
Foregrounds, Backgrounds, and Visual Attributes [58.633364000258645]
We call this dataset RIVAL10 consisting of roughly $26k$ instances over $10$ classes.
We evaluate the sensitivity of a broad set of models to noise corruptions in foregrounds, backgrounds and attributes.
In our analysis, we consider diverse state-of-the-art architectures (ResNets, Transformers) and training procedures (CLIP, SimCLR, DeiT, Adversarial Training)
arXiv Detail & Related papers (2022-01-26T06:31:28Z) - BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models
Identify Analogies? [35.381345454627]
We analyze the capabilities of transformer-based language models on an unsupervised task of identifying analogies.
Off-the-shelf language models can identify analogies to a certain extent, but struggle with abstract and complex relations.
Our results raise important questions for future work about how, and to what extent, pre-trained language models capture knowledge about abstract semantic relations.
arXiv Detail & Related papers (2021-05-11T11:38:49Z) - Few-shot Visual Reasoning with Meta-analogical Contrastive Learning [141.2562447971]
We propose to solve a few-shot (or low-shot) visual reasoning problem, by resorting to analogical reasoning.
We extract structural relationships between elements in both domains, and enforce them to be as similar as possible with analogical learning.
We validate our method on RAVEN dataset, on which it outperforms state-of-the-art method, with larger gains when the training data is scarce.
arXiv Detail & Related papers (2020-07-23T14:00:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.