Related papers: Relational Visual Similarity

Relational Visual Similarity

URL: http://arxiv.org/abs/2512.07833v1
Date: Mon, 08 Dec 2025 18:59:56 GMT
Title: Relational Visual Similarity
Authors: Thao Nguyen, Sicheng Mo, Krishna Kumar Singh, Yilin Wang, Jing Shi, Nicholas Kolkin, Eli Shechtman, Yong Jae Lee, Yuheng Li,
Abstract summary: relational similarity is arguable by cognitive scientist to be what distinguishes humans from other species.<n>All widely used visual similarity metrics today focus solely on perceptual attribute similarity.<n>Our study shows that while relational similarity has a lot of real-world applications, existing image similarity models fail to capture it.
Score: 75.39827145344957
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Humans do not just see attribute similarity -- we also see relational similarity. An apple is like a peach because both are reddish fruit, but the Earth is also like a peach: its crust, mantle, and core correspond to the peach's skin, flesh, and pit. This ability to perceive and recognize relational similarity, is arguable by cognitive scientist to be what distinguishes humans from other species. Yet, all widely used visual similarity metrics today (e.g., LPIPS, CLIP, DINO) focus solely on perceptual attribute similarity and fail to capture the rich, often surprising relational similarities that humans perceive. How can we go beyond the visible content of an image to capture its relational properties? How can we bring images with the same relational logic closer together in representation space? To answer these questions, we first formulate relational image similarity as a measurable problem: two images are relationally similar when their internal relations or functions among visual elements correspond, even if their visual attributes differ. We then curate 114k image-caption dataset in which the captions are anonymized -- describing the underlying relational logic of the scene rather than its surface content. Using this dataset, we finetune a Vision-Language model to measure the relational similarity between images. This model serves as the first step toward connecting images by their underlying relational structure rather than their visible appearance. Our study shows that while relational similarity has a lot of real-world applications, existing image similarity models fail to capture it -- revealing a critical gap in visual computing.

Related papers

Representations in vision and language converge in a shared, multidimensional space of perceived similarities [0.0]
We show that visual and linguistic similarity judgements converge at the behavioural level.<n>We also predict a remarkably similar network of fMRI brain responses evoked by viewing the natural scene images.<n>These findings demonstrate that human visual and linguistic similarity judgements are grounded in a shared, modality-agnostic representational structure.
arXiv Detail & Related papers (2025-07-29T14:42:31Z)
Mutual Information calculation on different appearances [0.0]
We apply the mutual information formula to image matching, where image A is the moving object and image B is the target object. For comparison, we also used entropy and information-gain methods to test the dependency of the images.
arXiv Detail & Related papers (2024-07-10T07:12:50Z)
Interpretable Measures of Conceptual Similarity by Complexity-Constrained Descriptive Auto-Encoding [112.0878081944858]
Quantifying the degree of similarity between images is a key copyright issue for image-based machine learning. We seek to define and compute a notion of "conceptual similarity" among images that captures high-level relations. Two highly dissimilar images can be discriminated early in their description, whereas conceptually dissimilar ones will need more detail to be distinguished.
arXiv Detail & Related papers (2024-02-14T03:31:17Z)
Learning an Adaptation Function to Assess Image Visual Similarities [0.0]
We focus here on the specific task of learning visual image similarities when analogy matters. We propose to compare different supervised, semi-supervised and self-supervised networks, pre-trained on distinct scales and contents datasets. Our experiments conducted on the Totally Looks Like image dataset highlight the interest of our method, by increasing the retrieval scores of the best model @1 by 2.25x.
arXiv Detail & Related papers (2022-06-03T07:15:00Z)
Attributable Visual Similarity Learning [90.69718495533144]
This paper proposes an attributable visual similarity learning (AVSL) framework for a more accurate and explainable similarity measure between images. Motivated by the human semantic similarity cognition, we propose a generalized similarity learning paradigm to represent the similarity between two images with a graph. Experiments on the CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate significant improvements over existing deep similarity learning methods.
arXiv Detail & Related papers (2022-03-28T17:35:31Z)
Kinship Verification Based on Cross-Generation Feature Interaction Learning [53.62256887837659]
Kinship verification from facial images has been recognized as an emerging yet challenging technique in computer vision applications. We propose a novel cross-generation feature interaction learning (CFIL) framework for robust kinship verification.
arXiv Detail & Related papers (2021-09-07T01:50:50Z)
Effectively Leveraging Attributes for Visual Similarity [52.2646549020835]
We propose the Pairwise Attribute-informed similarity Network (PAN), which breaks similarity learning into capturing similarity conditions and relevance scores from a joint representation of two images. PAN obtains a 4-9% improvement on compatibility prediction between clothing items on Polyvore Outfits, a 5% gain on few shot classification of images using Caltech-UCSD Birds (CUB), and over 1% boost to Recall@1 on In-Shop Clothes Retrieval.
arXiv Detail & Related papers (2021-05-04T18:28:35Z)
Few-shot Visual Reasoning with Meta-analogical Contrastive Learning [141.2562447971]
We propose to solve a few-shot (or low-shot) visual reasoning problem, by resorting to analogical reasoning. We extract structural relationships between elements in both domains, and enforce them to be as similar as possible with analogical learning. We validate our method on RAVEN dataset, on which it outperforms state-of-the-art method, with larger gains when the training data is scarce.
arXiv Detail & Related papers (2020-07-23T14:00:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.