Is Cosine-Similarity of Embeddings Really About Similarity?
- URL: http://arxiv.org/abs/2403.05440v1
- Date: Fri, 8 Mar 2024 16:48:20 GMT
- Title: Is Cosine-Similarity of Embeddings Really About Similarity?
- Authors: Harald Steck, Chaitanya Ekanadham, Nathan Kallus
- Abstract summary: Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations.
We study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights.
We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless similarities'
- Score: 46.75365717794515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cosine-similarity is the cosine of the angle between two vectors, or
equivalently the dot product between their normalizations. A popular
application is to quantify semantic similarity between high-dimensional objects
by applying cosine-similarity to a learned low-dimensional feature embedding.
This can work better but sometimes also worse than the unnormalized dot-product
between embedded vectors in practice. To gain insight into this empirical
observation, we study embeddings derived from regularized linear models, where
closed-form solutions facilitate analytical insights. We derive analytically
how cosine-similarity can yield arbitrary and therefore meaningless
`similarities.' For some linear models the similarities are not even unique,
while for others they are implicitly controlled by the regularization. We
discuss implications beyond linear models: a combination of different
regularizations are employed when learning deep models; these have implicit and
unintended effects when taking cosine-similarities of the resulting embeddings,
rendering results opaque and possibly arbitrary. Based on these insights, we
caution against blindly using cosine-similarity and outline alternatives.
Related papers
- The Double-Ellipsoid Geometry of CLIP [4.013156524547072]
Contrastive Language-Image Pre-Training (CLIP) is highly instrumental in machine learning applications.
We show that text and image reside on linearly separable ellipsoid shells, not centered at the origin.
A new notion of conformity is introduced, which measures the average cosine similarity of an instance to any other instance.
arXiv Detail & Related papers (2024-11-21T16:27:22Z) - Differentiable Optimization of Similarity Scores Between Models and Brains [1.5391321019692434]
Similarity measures such as linear regression, Centered Kernel Alignment (CKA), Normalized Bures Similarity (NBS), and angular Procrustes distance are often used to quantify this similarity.
Here, we introduce a novel tool to investigate what drives high similarity scores and what constitutes a "good" score.
Surprisingly, we find that high similarity scores do not guarantee encoding task-relevant information in a manner consistent with neural data.
arXiv Detail & Related papers (2024-07-09T17:31:47Z) - Why bother with geometry? On the relevance of linear decompositions of
Transformer embeddings [5.151529346168568]
We study representations from machine-translation decoders using two of such embedding decomposition methods.
Our results indicate that, while decomposition-derived indicators effectively correlate with model performance, variation across different runs suggests a more nuanced take on this question.
arXiv Detail & Related papers (2023-10-10T19:56:10Z) - Beyond Instance Discrimination: Relation-aware Contrastive
Self-supervised Learning [75.46664770669949]
We present relation-aware contrastive self-supervised learning (ReCo) to integrate instance relations.
Our ReCo consistently gains remarkable performance improvements.
arXiv Detail & Related papers (2022-11-02T03:25:28Z) - Duality-Induced Regularizer for Semantic Matching Knowledge Graph
Embeddings [70.390286614242]
We propose a novel regularizer -- namely, DUality-induced RegulArizer (DURA) -- which effectively encourages the entities with similar semantics to have similar embeddings.
Experiments demonstrate that DURA consistently and significantly improves the performance of state-of-the-art semantic matching models.
arXiv Detail & Related papers (2022-03-24T09:24:39Z) - Generalized quantum similarity learning [0.0]
We propose using quantum networks (GQSim) for learning task-dependent (a)symmetric similarity between data that need not have the same dimensionality.
We demonstrate that the similarity measure derived using this technique is $(epsilon,gamma,tau)$-good, resulting in theoretically guaranteed performance.
arXiv Detail & Related papers (2022-01-07T03:28:19Z) - Sublinear Time Approximation of Text Similarity Matrices [50.73398637380375]
We introduce a generalization of the popular Nystr"om method to the indefinite setting.
Our algorithm can be applied to any similarity matrix and runs in sublinear time in the size of the matrix.
We show that our method, along with a simple variant of CUR decomposition, performs very well in approximating a variety of similarity matrices.
arXiv Detail & Related papers (2021-12-17T17:04:34Z) - A Differential Geometry Perspective on Orthogonal Recurrent Models [56.09491978954866]
We employ tools and insights from differential geometry to offer a novel perspective on orthogonal RNNs.
We show that orthogonal RNNs may be viewed as optimizing in the space of divergence-free vector fields.
Motivated by this observation, we study a new recurrent model, which spans the entire space of vector fields.
arXiv Detail & Related papers (2021-02-18T19:39:22Z) - Pairwise Supervision Can Provably Elicit a Decision Boundary [84.58020117487898]
Similarity learning is a problem to elicit useful representations by predicting the relationship between a pair of patterns.
We show that similarity learning is capable of solving binary classification by directly eliciting a decision boundary.
arXiv Detail & Related papers (2020-06-11T05:35:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.