Measuring similarity between embedding spaces using induced neighborhood graphs
- URL: http://arxiv.org/abs/2411.08687v1
- Date: Wed, 13 Nov 2024 15:22:33 GMT
- Title: Measuring similarity between embedding spaces using induced neighborhood graphs
- Authors: Tiago F. Tavares, Fabio Ayres, Paris Smaragdis,
- Abstract summary: We propose a metric to evaluate the similarity between paired item representations.
Our results show that accuracy in both analogy and zero-shot classification tasks correlates with the embedding similarity.
- Score: 10.056989400384772
- License:
- Abstract: Deep Learning techniques have excelled at generating embedding spaces that capture semantic similarities between items. Often these representations are paired, enabling experiments with analogies (pairs within the same domain) and cross-modality (pairs across domains). These experiments are based on specific assumptions about the geometry of embedding spaces, which allow finding paired items by extrapolating the positional relationships between embedding pairs in the training dataset, allowing for tasks such as finding new analogies, and multimodal zero-shot classification. In this work, we propose a metric to evaluate the similarity between paired item representations. Our proposal is built from the structural similarity between the nearest-neighbors induced graphs of each representation, and can be configured to compare spaces based on different distance metrics and on different neighborhood sizes. We demonstrate that our proposal can be used to identify similar structures at different scales, which is hard to achieve with kernel methods such as Centered Kernel Alignment (CKA). We further illustrate our method with two case studies: an analogy task using GloVe embeddings, and zero-shot classification in the CIFAR-100 dataset using CLIP embeddings. Our results show that accuracy in both analogy and zero-shot classification tasks correlates with the embedding similarity. These findings can help explain performance differences in these tasks, and may lead to improved design of paired-embedding models in the future.
Related papers
- Supervised Pattern Recognition Involving Skewed Feature Densities [49.48516314472825]
The classification potential of the Euclidean distance and a dissimilarity index based on the coincidence similarity index are compared.
The accuracy of classifying the intersection point between the densities of two adjacent groups is taken into account.
arXiv Detail & Related papers (2024-09-02T12:45:18Z) - Cluster-Aware Similarity Diffusion for Instance Retrieval [64.40171728912702]
Diffusion-based re-ranking is a common method used for retrieving instances by performing similarity propagation in a nearest neighbor graph.
We propose a novel Cluster-Aware Similarity (CAS) diffusion for instance retrieval.
arXiv Detail & Related papers (2024-06-04T14:19:50Z) - A general framework for distributed approximate similarity search with arbitrary distances [0.5030361857850012]
Similarity search is a central problem in domains such as information management and retrieval or data analysis.
Many similarity search algorithms are designed or specifically adapted to metric distances.
This paper presents GDASC, a general framework for distributed approximate similarity search that accepts arbitrary distances.
arXiv Detail & Related papers (2024-05-22T16:19:52Z) - Attributable Visual Similarity Learning [90.69718495533144]
This paper proposes an attributable visual similarity learning (AVSL) framework for a more accurate and explainable similarity measure between images.
Motivated by the human semantic similarity cognition, we propose a generalized similarity learning paradigm to represent the similarity between two images with a graph.
Experiments on the CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate significant improvements over existing deep similarity learning methods.
arXiv Detail & Related papers (2022-03-28T17:35:31Z) - Towards Similarity-Aware Time-Series Classification [51.2400839966489]
We study time-series classification (TSC), a fundamental task of time-series data mining.
We propose Similarity-Aware Time-Series Classification (SimTSC), a framework that models similarity information with graph neural networks (GNNs)
arXiv Detail & Related papers (2022-01-05T02:14:57Z) - Modelling Neighbor Relation in Joint Space-Time Graph for Video
Correspondence Learning [53.74240452117145]
This paper presents a self-supervised method for learning reliable visual correspondence from unlabeled videos.
We formulate the correspondence as finding paths in a joint space-time graph, where nodes are grid patches sampled from frames, and are linked by two types of edges.
Our learned representation outperforms the state-of-the-art self-supervised methods on a variety of visual tasks.
arXiv Detail & Related papers (2021-09-28T05:40:01Z) - Cycle Registration in Persistent Homology with Applications in
Topological Bootstrap [0.0]
We propose a novel approach for comparing the persistent homology representations of two spaces (filtrations)
We do so by defining a correspondence relation between individual persistent cycles of two different spaces.
Our matching of cycles is based on both the persistence intervals and the spatial placement of each feature.
arXiv Detail & Related papers (2021-01-03T20:12:00Z) - Similarity Based Stratified Splitting: an approach to train better
classifiers [0.0]
We propose a Similarity-Based Stratified Splitting technique, which uses both the output and input space information to split the data.
We evaluate our proposal in twenty-two benchmark datasets with classifiers such as Multi-Layer Perceptron, Support Vector Machine, Random Forest and K-Nearest Neighbors.
arXiv Detail & Related papers (2020-10-13T01:07:48Z) - Few-shot Visual Reasoning with Meta-analogical Contrastive Learning [141.2562447971]
We propose to solve a few-shot (or low-shot) visual reasoning problem, by resorting to analogical reasoning.
We extract structural relationships between elements in both domains, and enforce them to be as similar as possible with analogical learning.
We validate our method on RAVEN dataset, on which it outperforms state-of-the-art method, with larger gains when the training data is scarce.
arXiv Detail & Related papers (2020-07-23T14:00:34Z) - Building and Interpreting Deep Similarity Models [0.0]
We propose to make similarities interpretable by augmenting them with an explanation in terms of input features.
We develop BiLRP, a scalable and theoretically founded method to systematically decompose similarity scores on pairs of input features.
arXiv Detail & Related papers (2020-03-11T17:46:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.