In Defense of Cosine Similarity: Normalization Eliminates the Gauge Freedom
- URL: http://arxiv.org/abs/2602.19393v1
- Date: Mon, 23 Feb 2026 00:00:57 GMT
- Title: In Defense of Cosine Similarity: Normalization Eliminates the Gauge Freedom
- Authors: Taha Bouhsine,
- Abstract summary: Steck, Ekanadham, and Kallus demonstrate that cosine similarity of learned embeddings can be rendered arbitrary by a diagonal gauge'' matrix $D$.<n>We argue that their conclusion conflates the pathology of an incompatible training objective with the geometric validity of cosine distance on the unit sphere.
- Score: 0.42303492200814446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Steck, Ekanadham, and Kallus [arXiv:2403.05440] demonstrate that cosine similarity of learned embeddings from matrix factorization models can be rendered arbitrary by a diagonal ``gauge'' matrix $D$. Their result is correct and important for practitioners who compute cosine similarity on embeddings trained with dot-product objectives. However, we argue that their conclusion, cautioning against cosine similarity in general, conflates the pathology of an incompatible training objective with the geometric validity of cosine distance on the unit sphere. We prove that when embeddings are constrained to the unit sphere $\mathbb{S}^{d-1}$ (either during or after training with an appropriate objective), the $D$-matrix ambiguity vanishes identically, and cosine distance reduces to exactly half the squared Euclidean distance. This monotonic equivalence implies that cosine-based and Euclidean-based neighbor rankings are identical on normalized embeddings. The ``problem'' with cosine similarity is not cosine similarity, it is the failure to normalize.
Related papers
- Beyond Cosine Similarity [5.076419064097734]
Cosine similarity, the standard metric for measuring semantic similarity in vector spaces, is mathematically grounded in the Cauchy-Schwarz inequality.<n>We advance this theoretical underpinning by deriving a tighter upper bound for the dot product than the classical Cauchy-Schwarz bound.<n>Our work establishes recos as a mathematically principled and empirically superior alternative, offering enhanced accuracy for semantic analysis in complex embedding spaces.
arXiv Detail & Related papers (2026-02-05T03:46:21Z) - Riemannian Zeroth-Order Gradient Estimation with Structure-Preserving Metrics for Geodesically Incomplete Manifolds [57.179679246370114]
We construct metrics that are geodesically complete while ensuring that every stationary point under the new metric remains stationary under the original one.<n>An $$-stationary point under the constructed metric $g'$ also corresponds to an $$-stationary point under the original metric $g'$.<n>Experiments on a practical mesh optimization task demonstrate that our framework maintains stable convergence even in the absence of geodesic completeness.
arXiv Detail & Related papers (2026-01-12T22:08:03Z) - Variance-Adjusted Cosine Distance as Similarity Metric [3.776817669946595]
This study demonstrates limitations of application of cosine similarity.<n>Traditional cosine similarity metric is valid only in the Euclidean space.<n>When there is variance and correlation in the data, then cosine distance is not a completely accurate measure of similarity.
arXiv Detail & Related papers (2025-02-04T11:20:57Z) - The Double-Ellipsoid Geometry of CLIP [4.013156524547072]
Contrastive Language-Image Pre-Training (CLIP) is highly instrumental in machine learning applications.<n>We show that text and image reside on linearly separable ellipsoid shells, not centered at the origin.<n>A new notion of conformity is introduced, which measures the average cosine similarity of an instance to any other instance.
arXiv Detail & Related papers (2024-11-21T16:27:22Z) - On Affine Homotopy between Language Encoders [127.55969928213248]
We study the properties of emphaffine alignment of language encoders.<n>We find that while affine alignment is fundamentally an asymmetric notion of similarity, it is still informative of extrinsic similarity.
arXiv Detail & Related papers (2024-06-04T13:58:28Z) - Is Cosine-Similarity of Embeddings Really About Similarity? [46.75365717794515]
Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations.
We study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights.
We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless similarities'
arXiv Detail & Related papers (2024-03-08T16:48:20Z) - Attributable Visual Similarity Learning [90.69718495533144]
This paper proposes an attributable visual similarity learning (AVSL) framework for a more accurate and explainable similarity measure between images.
Motivated by the human semantic similarity cognition, we propose a generalized similarity learning paradigm to represent the similarity between two images with a graph.
Experiments on the CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate significant improvements over existing deep similarity learning methods.
arXiv Detail & Related papers (2022-03-28T17:35:31Z) - Sublinear Time Approximation of Text Similarity Matrices [50.73398637380375]
We introduce a generalization of the popular Nystr"om method to the indefinite setting.
Our algorithm can be applied to any similarity matrix and runs in sublinear time in the size of the matrix.
We show that our method, along with a simple variant of CUR decomposition, performs very well in approximating a variety of similarity matrices.
arXiv Detail & Related papers (2021-12-17T17:04:34Z) - A Triangle Inequality for Cosine Similarity [0.0]
Similarity search is a fundamental problem for many data analysis techniques.
In this paper, we derive a triangle inequality for Cosine similarity that is suitable for efficient similarity search with many standard search structures.
arXiv Detail & Related papers (2021-07-08T19:13:34Z) - Word Rotator's Distance [50.67809662270474]
Key principle in assessing textual similarity is measuring the degree of semantic overlap between two texts by considering the word alignment.
We show that the norm of word vectors is a good proxy for word importance, and their angle is a good proxy for word similarity.
We propose a method that first decouples word vectors into their norm and direction, and then computes alignment-based similarity.
arXiv Detail & Related papers (2020-04-30T17:48:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.