Related papers: Beyond Cosine Similarity

Beyond Cosine Similarity

URL: http://arxiv.org/abs/2602.05266v1
Date: Thu, 05 Feb 2026 03:46:21 GMT
Title: Beyond Cosine Similarity
Authors: Xinbo Ai,
Abstract summary: Cosine similarity, the standard metric for measuring semantic similarity in vector spaces, is mathematically grounded in the Cauchy-Schwarz inequality.<n>We advance this theoretical underpinning by deriving a tighter upper bound for the dot product than the classical Cauchy-Schwarz bound.<n>Our work establishes recos as a mathematically principled and empirically superior alternative, offering enhanced accuracy for semantic analysis in complex embedding spaces.
Score: 5.076419064097734
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cosine similarity, the standard metric for measuring semantic similarity in vector spaces, is mathematically grounded in the Cauchy-Schwarz inequality, which inherently limits it to capturing linear relationships--a constraint that fails to model the complex, nonlinear structures of real-world semantic spaces. We advance this theoretical underpinning by deriving a tighter upper bound for the dot product than the classical Cauchy-Schwarz bound. This new bound leads directly to recos, a similarity metric that normalizes the dot product by the sorted vector components. recos relaxes the condition for perfect similarity from strict linear dependence to ordinal concordance, thereby capturing a broader class of relationships. Extensive experiments across 11 embedding models--spanning static, contextualized, and universal types--demonstrate that recos consistently outperforms traditional cosine similarity, achieving higher correlation with human judgments on standard Semantic Textual Similarity (STS) benchmarks. Our work establishes recos as a mathematically principled and empirically superior alternative, offering enhanced accuracy for semantic analysis in complex embedding spaces.

Related papers

In Defense of Cosine Similarity: Normalization Eliminates the Gauge Freedom [0.42303492200814446]
Steck, Ekanadham, and Kallus demonstrate that cosine similarity of learned embeddings can be rendered arbitrary by a diagonal gauge'' matrix $D$.<n>We argue that their conclusion conflates the pathology of an incompatible training objective with the geometric validity of cosine distance on the unit sphere.
arXiv Detail & Related papers (2026-02-23T00:00:57Z)
Calibrated Similarity for Reliable Geometric Analysis of Embedding Spaces [0.0]
We construct a isotonic transformation that achieves near-perfect calibration while preserving rank correlation and local stability.<n>Our contribution is not to replace cosine similarity, but to restore interpretability of its absolute values through monotone calibration.
arXiv Detail & Related papers (2026-01-23T17:14:44Z)
Unifying Information-Theoretic and Pair-Counting Clustering Similarity [51.660331450043806]
Clustering similarity measures are typically organized into two principal families, pair-counting and information-theoretic.<n>Here, we develop an analytical framework that unifies these families through two complementary perspectives.
arXiv Detail & Related papers (2025-11-04T21:13:32Z)
A unifying separability criterion based on extended correlation tensor [0.0]
Entanglement is fundamental inasmuch because it rephrases the quest for the classical-quantum demarcation line. We introduce and formulate a practicable criterion for separability based on the correlation tensor.
arXiv Detail & Related papers (2024-06-25T02:36:28Z)
Revisiting Cosine Similarity via Normalized ICA-transformed Embeddings [2.8402080392117757]
Cosine similarity is widely used to measure the similarity between two embeddings.<n>We propose a novel interpretation of cosine similarity as the sum of semantic similarities over axes.
arXiv Detail & Related papers (2024-06-16T15:44:37Z)
Is Cosine-Similarity of Embeddings Really About Similarity? [46.75365717794515]
Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. We study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights. We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless similarities'
arXiv Detail & Related papers (2024-03-08T16:48:20Z)
Beyond Instance Discrimination: Relation-aware Contrastive Self-supervised Learning [75.46664770669949]
We present relation-aware contrastive self-supervised learning (ReCo) to integrate instance relations. Our ReCo consistently gains remarkable performance improvements.
arXiv Detail & Related papers (2022-11-02T03:25:28Z)
On the Importance of Gradient Norm in PAC-Bayesian Bounds [92.82627080794491]
We propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities. We empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
arXiv Detail & Related papers (2022-10-12T12:49:20Z)
Evaluation of taxonomic and neural embedding methods for calculating semantic similarity [0.0]
We study the mechanisms between taxonomic and distributional similarity measures. We find that taxonomic similarity measures can depend on the shortest path length as a prime factor to predict semantic similarity. The synergy of retrofitting neural embeddings with concept relations in similarity prediction may indicate a new trend to leverage knowledge bases on transfer learning.
arXiv Detail & Related papers (2022-09-30T02:54:21Z)
Duality-Induced Regularizer for Semantic Matching Knowledge Graph Embeddings [70.390286614242]
We propose a novel regularizer -- namely, DUality-induced RegulArizer (DURA) -- which effectively encourages the entities with similar semantics to have similar embeddings. Experiments demonstrate that DURA consistently and significantly improves the performance of state-of-the-art semantic matching models.
arXiv Detail & Related papers (2022-03-24T09:24:39Z)
Attentive Normalization for Conditional Image Generation [126.08247355367043]
We characterize long-range dependence with attentive normalization (AN), which is an extension to traditional instance normalization. Compared with self-attention GAN, our attentive normalization does not need to measure the correlation of all locations. Experiments on class-conditional image generation and semantic inpainting verify the efficacy of our proposed module.
arXiv Detail & Related papers (2020-04-08T06:12:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.