Related papers: Is Cosine-Similarity of Embeddings Really About Similarity?

Is Cosine-Similarity of Embeddings Really About Similarity?

URL: http://arxiv.org/abs/2403.05440v1
Date: Fri, 8 Mar 2024 16:48:20 GMT
Title: Is Cosine-Similarity of Embeddings Really About Similarity?
Authors: Harald Steck, Chaitanya Ekanadham, Nathan Kallus
Abstract summary: Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. We study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights. We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless similarities'
Score: 46.75365717794515
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. A popular application is to quantify semantic similarity between high-dimensional objects by applying cosine-similarity to a learned low-dimensional feature embedding. This can work better but sometimes also worse than the unnormalized dot-product between embedded vectors in practice. To gain insight into this empirical observation, we study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights. We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless `similarities.' For some linear models the similarities are not even unique, while for others they are implicitly controlled by the regularization. We discuss implications beyond linear models: a combination of different regularizations are employed when learning deep models; these have implicit and unintended effects when taking cosine-similarities of the resulting embeddings, rendering results opaque and possibly arbitrary. Based on these insights, we caution against blindly using cosine-similarity and outline alternatives.

Related papers

Semantics at an Angle: When Cosine Similarity Works Until It Doesn't [0.0]
Cosine similarity has become a standard metric for comparing embeddings in machine learning. Recent studies have revealed important limitations, particularly when embedding norms carry meaningful semantic information. This article offers a reflective and selective examination of the evolution, strengths, and limitations of cosine similarity.
arXiv Detail & Related papers (2025-04-22T23:31:32Z)
Variance-Adjusted Cosine Distance as Similarity Metric [3.776817669946595]
This study demonstrates limitations of application of cosine similarity. Traditional cosine similarity metric is valid only in the Euclidean space. When there is variance and correlation in the data, then cosine distance is not a completely accurate measure of similarity.
arXiv Detail & Related papers (2025-02-04T11:20:57Z)
The Double-Ellipsoid Geometry of CLIP [4.013156524547072]
Contrastive Language-Image Pre-Training (CLIP) is highly instrumental in machine learning applications. We show that text and image reside on linearly separable ellipsoid shells, not centered at the origin. A new notion of conformity is introduced, which measures the average cosine similarity of an instance to any other instance.
arXiv Detail & Related papers (2024-11-21T16:27:22Z)
Differentiable Optimization of Similarity Scores Between Models and Brains [1.5391321019692434]
Similarity measures such as linear regression, Centered Kernel Alignment (CKA), Normalized Bures Similarity (NBS), and angular Procrustes distance are often used to quantify this similarity. Here, we introduce a novel tool to investigate what drives high similarity scores and what constitutes a "good" score. Surprisingly, we find that high similarity scores do not guarantee encoding task-relevant information in a manner consistent with neural data.
arXiv Detail & Related papers (2024-07-09T17:31:47Z)
Why bother with geometry? On the relevance of linear decompositions of Transformer embeddings [5.151529346168568]
We study representations from machine-translation decoders using two of such embedding decomposition methods. Our results indicate that, while decomposition-derived indicators effectively correlate with model performance, variation across different runs suggests a more nuanced take on this question.
arXiv Detail & Related papers (2023-10-10T19:56:10Z)
Beyond Instance Discrimination: Relation-aware Contrastive Self-supervised Learning [75.46664770669949]
We present relation-aware contrastive self-supervised learning (ReCo) to integrate instance relations. Our ReCo consistently gains remarkable performance improvements.
arXiv Detail & Related papers (2022-11-02T03:25:28Z)
Duality-Induced Regularizer for Semantic Matching Knowledge Graph Embeddings [70.390286614242]
We propose a novel regularizer -- namely, DUality-induced RegulArizer (DURA) -- which effectively encourages the entities with similar semantics to have similar embeddings. Experiments demonstrate that DURA consistently and significantly improves the performance of state-of-the-art semantic matching models.
arXiv Detail & Related papers (2022-03-24T09:24:39Z)
Generalized quantum similarity learning [0.0]
We propose using quantum networks (GQSim) for learning task-dependent (a)symmetric similarity between data that need not have the same dimensionality. We demonstrate that the similarity measure derived using this technique is $(epsilon,gamma,tau)$-good, resulting in theoretically guaranteed performance.
arXiv Detail & Related papers (2022-01-07T03:28:19Z)
Sublinear Time Approximation of Text Similarity Matrices [50.73398637380375]
We introduce a generalization of the popular Nystr"om method to the indefinite setting. Our algorithm can be applied to any similarity matrix and runs in sublinear time in the size of the matrix. We show that our method, along with a simple variant of CUR decomposition, performs very well in approximating a variety of similarity matrices.
arXiv Detail & Related papers (2021-12-17T17:04:34Z)
A Differential Geometry Perspective on Orthogonal Recurrent Models [56.09491978954866]
We employ tools and insights from differential geometry to offer a novel perspective on orthogonal RNNs. We show that orthogonal RNNs may be viewed as optimizing in the space of divergence-free vector fields. Motivated by this observation, we study a new recurrent model, which spans the entire space of vector fields.
arXiv Detail & Related papers (2021-02-18T19:39:22Z)
Pairwise Supervision Can Provably Elicit a Decision Boundary [84.58020117487898]
Similarity learning is a problem to elicit useful representations by predicting the relationship between a pair of patterns. We show that similarity learning is capable of solving binary classification by directly eliciting a decision boundary.
arXiv Detail & Related papers (2020-06-11T05:35:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.