Beyond Cosine Similarity
- URL: http://arxiv.org/abs/2602.05266v1
- Date: Thu, 05 Feb 2026 03:46:21 GMT
- Title: Beyond Cosine Similarity
- Authors: Xinbo Ai,
- Abstract summary: Cosine similarity, the standard metric for measuring semantic similarity in vector spaces, is mathematically grounded in the Cauchy-Schwarz inequality.<n>We advance this theoretical underpinning by deriving a tighter upper bound for the dot product than the classical Cauchy-Schwarz bound.<n>Our work establishes recos as a mathematically principled and empirically superior alternative, offering enhanced accuracy for semantic analysis in complex embedding spaces.
- Score: 5.076419064097734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cosine similarity, the standard metric for measuring semantic similarity in vector spaces, is mathematically grounded in the Cauchy-Schwarz inequality, which inherently limits it to capturing linear relationships--a constraint that fails to model the complex, nonlinear structures of real-world semantic spaces. We advance this theoretical underpinning by deriving a tighter upper bound for the dot product than the classical Cauchy-Schwarz bound. This new bound leads directly to recos, a similarity metric that normalizes the dot product by the sorted vector components. recos relaxes the condition for perfect similarity from strict linear dependence to ordinal concordance, thereby capturing a broader class of relationships. Extensive experiments across 11 embedding models--spanning static, contextualized, and universal types--demonstrate that recos consistently outperforms traditional cosine similarity, achieving higher correlation with human judgments on standard Semantic Textual Similarity (STS) benchmarks. Our work establishes recos as a mathematically principled and empirically superior alternative, offering enhanced accuracy for semantic analysis in complex embedding spaces.
Related papers
- In Defense of Cosine Similarity: Normalization Eliminates the Gauge Freedom [0.42303492200814446]
Steck, Ekanadham, and Kallus demonstrate that cosine similarity of learned embeddings can be rendered arbitrary by a diagonal gauge'' matrix $D$.<n>We argue that their conclusion conflates the pathology of an incompatible training objective with the geometric validity of cosine distance on the unit sphere.
arXiv Detail & Related papers (2026-02-23T00:00:57Z) - Calibrated Similarity for Reliable Geometric Analysis of Embedding Spaces [0.0]
We construct a isotonic transformation that achieves near-perfect calibration while preserving rank correlation and local stability.<n>Our contribution is not to replace cosine similarity, but to restore interpretability of its absolute values through monotone calibration.
arXiv Detail & Related papers (2026-01-23T17:14:44Z) - Unifying Information-Theoretic and Pair-Counting Clustering Similarity [51.660331450043806]
Clustering similarity measures are typically organized into two principal families, pair-counting and information-theoretic.<n>Here, we develop an analytical framework that unifies these families through two complementary perspectives.
arXiv Detail & Related papers (2025-11-04T21:13:32Z) - A unifying separability criterion based on extended correlation tensor [0.0]
Entanglement is fundamental inasmuch because it rephrases the quest for the classical-quantum demarcation line.
We introduce and formulate a practicable criterion for separability based on the correlation tensor.
arXiv Detail & Related papers (2024-06-25T02:36:28Z) - Revisiting Cosine Similarity via Normalized ICA-transformed Embeddings [2.8402080392117757]
Cosine similarity is widely used to measure the similarity between two embeddings.<n>We propose a novel interpretation of cosine similarity as the sum of semantic similarities over axes.
arXiv Detail & Related papers (2024-06-16T15:44:37Z) - Is Cosine-Similarity of Embeddings Really About Similarity? [46.75365717794515]
Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations.
We study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights.
We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless similarities'
arXiv Detail & Related papers (2024-03-08T16:48:20Z) - Beyond Instance Discrimination: Relation-aware Contrastive
Self-supervised Learning [75.46664770669949]
We present relation-aware contrastive self-supervised learning (ReCo) to integrate instance relations.
Our ReCo consistently gains remarkable performance improvements.
arXiv Detail & Related papers (2022-11-02T03:25:28Z) - On the Importance of Gradient Norm in PAC-Bayesian Bounds [92.82627080794491]
We propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities.
We empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
arXiv Detail & Related papers (2022-10-12T12:49:20Z) - Evaluation of taxonomic and neural embedding methods for calculating
semantic similarity [0.0]
We study the mechanisms between taxonomic and distributional similarity measures.
We find that taxonomic similarity measures can depend on the shortest path length as a prime factor to predict semantic similarity.
The synergy of retrofitting neural embeddings with concept relations in similarity prediction may indicate a new trend to leverage knowledge bases on transfer learning.
arXiv Detail & Related papers (2022-09-30T02:54:21Z) - Duality-Induced Regularizer for Semantic Matching Knowledge Graph
Embeddings [70.390286614242]
We propose a novel regularizer -- namely, DUality-induced RegulArizer (DURA) -- which effectively encourages the entities with similar semantics to have similar embeddings.
Experiments demonstrate that DURA consistently and significantly improves the performance of state-of-the-art semantic matching models.
arXiv Detail & Related papers (2022-03-24T09:24:39Z) - Attentive Normalization for Conditional Image Generation [126.08247355367043]
We characterize long-range dependence with attentive normalization (AN), which is an extension to traditional instance normalization.
Compared with self-attention GAN, our attentive normalization does not need to measure the correlation of all locations.
Experiments on class-conditional image generation and semantic inpainting verify the efficacy of our proposed module.
arXiv Detail & Related papers (2020-04-08T06:12:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.