Interpretable Measures of Conceptual Similarity by
Complexity-Constrained Descriptive Auto-Encoding
- URL: http://arxiv.org/abs/2402.08919v1
- Date: Wed, 14 Feb 2024 03:31:17 GMT
- Title: Interpretable Measures of Conceptual Similarity by
Complexity-Constrained Descriptive Auto-Encoding
- Authors: Alessandro Achille, Greg Ver Steeg, Tian Yu Liu, Matthew Trager,
Carson Klingenberg, Stefano Soatto
- Abstract summary: Quantifying the degree of similarity between images is a key copyright issue for image-based machine learning.
We seek to define and compute a notion of "conceptual similarity" among images that captures high-level relations.
Two highly dissimilar images can be discriminated early in their description, whereas conceptually dissimilar ones will need more detail to be distinguished.
- Score: 112.0878081944858
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quantifying the degree of similarity between images is a key copyright issue
for image-based machine learning. In legal doctrine however, determining the
degree of similarity between works requires subjective analysis, and
fact-finders (judges and juries) can demonstrate considerable variability in
these subjective judgement calls. Images that are structurally similar can be
deemed dissimilar, whereas images of completely different scenes can be deemed
similar enough to support a claim of copying. We seek to define and compute a
notion of "conceptual similarity" among images that captures high-level
relations even among images that do not share repeated elements or visually
similar components. The idea is to use a base multi-modal model to generate
"explanations" (captions) of visual data at increasing levels of complexity.
Then, similarity can be measured by the length of the caption needed to
discriminate between the two images: Two highly dissimilar images can be
discriminated early in their description, whereas conceptually dissimilar ones
will need more detail to be distinguished. We operationalize this definition
and show that it correlates with subjective (averaged human evaluation)
assessment, and beats existing baselines on both image-to-image and
text-to-text similarity benchmarks. Beyond just providing a number, our method
also offers interpretability by pointing to the specific level of granularity
of the description where the source data are differentiated.
Related papers
- Introspective Deep Metric Learning for Image Retrieval [80.29866561553483]
We argue that a good similarity model should consider the semantic discrepancies with caution to better deal with ambiguous images for more robust training.
We propose to represent an image using not only a semantic embedding but also an accompanying uncertainty embedding, which describes the semantic characteristics and ambiguity of an image, respectively.
The proposed IDML framework improves the performance of deep metric learning through uncertainty modeling and attains state-of-the-art results on the widely used CUB-200-2011, Cars196, and Stanford Online Products datasets.
arXiv Detail & Related papers (2022-05-09T17:51:44Z) - Attributable Visual Similarity Learning [90.69718495533144]
This paper proposes an attributable visual similarity learning (AVSL) framework for a more accurate and explainable similarity measure between images.
Motivated by the human semantic similarity cognition, we propose a generalized similarity learning paradigm to represent the similarity between two images with a graph.
Experiments on the CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate significant improvements over existing deep similarity learning methods.
arXiv Detail & Related papers (2022-03-28T17:35:31Z) - Two-stream Hierarchical Similarity Reasoning for Image-text Matching [66.43071159630006]
A hierarchical similarity reasoning module is proposed to automatically extract context information.
Previous approaches only consider learning single-stream similarity alignment.
A two-stream architecture is developed to decompose image-text matching into image-to-text level and text-to-image level similarity computation.
arXiv Detail & Related papers (2022-03-10T12:56:10Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Hierarchical Similarity Learning for Language-based Product Image
Retrieval [40.83290730640458]
This paper focuses on the cross-modal similarity measurement, and proposes a novel Hierarchical Similarity Learning network.
Experiments on a large-scale product retrieval dataset demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2021-02-18T14:23:16Z) - Consensus-Aware Visual-Semantic Embedding for Image-Text Matching [69.34076386926984]
Image-text matching plays a central role in bridging vision and language.
Most existing approaches only rely on the image-text instance pair to learn their representations.
We propose a Consensus-aware Visual-Semantic Embedding model to incorporate the consensus information.
arXiv Detail & Related papers (2020-07-17T10:22:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.