Semantics at an Angle: When Cosine Similarity Works Until It Doesn't
- URL: http://arxiv.org/abs/2504.16318v1
- Date: Tue, 22 Apr 2025 23:31:32 GMT
- Title: Semantics at an Angle: When Cosine Similarity Works Until It Doesn't
- Authors: Kisung You,
- Abstract summary: Cosine similarity has become a standard metric for comparing embeddings in machine learning.<n>Recent studies have revealed important limitations, particularly when embedding norms carry meaningful semantic information.<n>This article offers a reflective and selective examination of the evolution, strengths, and limitations of cosine similarity.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cosine similarity has become a standard metric for comparing embeddings in modern machine learning. Its scale-invariance and alignment with model training objectives have contributed to its widespread adoption. However, recent studies have revealed important limitations, particularly when embedding norms carry meaningful semantic information. This informal article offers a reflective and selective examination of the evolution, strengths, and limitations of cosine similarity. We highlight why it performs well in many settings, where it tends to break down, and how emerging alternatives are beginning to address its blind spots. We hope to offer a mix of conceptual clarity and practical perspective, especially for quantitative scientists who think about embeddings not just as vectors, but as geometric and philosophical objects.
Related papers
- Machine Unlearning in Hyperbolic vs. Euclidean Multimodal Contrastive Learning: Adapting Alignment Calibration to MERU [50.9588132578029]
This paper investigates machine unlearning in hyperbolic contrastive learning.<n>We adapt Alignment to MERU, a model that embeds images and text in hyperbolic space to better capture semantic hierarchies.<n>Our approach introduces hyperbolic-specific components including entailment calibration and norm regularization that leverage the unique properties of hyperbolic space.
arXiv Detail & Related papers (2025-03-19T12:47:37Z) - Modelling Commonsense Commonalities with Multi-Facet Concept Embeddings [25.52752452574944]
Concept embeddings identify concepts which share some property of interest.
Standard embeddings reflect basic taxonomic categories, making them unsuitable for finding commonalities that refer to more specific aspects.
We show that this leads to embeddings which capture a more diverse range of commonsense properties, and consistently improves results in downstream tasks.
arXiv Detail & Related papers (2024-03-25T17:44:45Z) - Is Cosine-Similarity of Embeddings Really About Similarity? [46.75365717794515]
Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations.
We study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights.
We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless similarities'
arXiv Detail & Related papers (2024-03-08T16:48:20Z) - How Do Transformers Learn Topic Structure: Towards a Mechanistic
Understanding [56.222097640468306]
We provide mechanistic understanding of how transformers learn "semantic structure"
We show, through a combination of mathematical analysis and experiments on Wikipedia data, that the embedding layer and the self-attention layer encode the topical structure.
arXiv Detail & Related papers (2023-03-07T21:42:17Z) - Deflectometry for specular surfaces: an overview [0.0]
Deflectometry as a technical approach to assessing reflective surfaces has now existed for almost 40 years.
Different aspects and variations of the method have been studied in multiple theses and research articles, and reviews are also becoming available for certain subtopics.
arXiv Detail & Related papers (2022-04-10T22:17:47Z) - Chaos is a Ladder: A New Theoretical Understanding of Contrastive
Learning via Augmentation Overlap [64.60460828425502]
We propose a new guarantee on the downstream performance of contrastive learning.
Our new theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations.
We propose an unsupervised model selection metric ARC that aligns well with downstream accuracy.
arXiv Detail & Related papers (2022-03-25T05:36:26Z) - On Quantitative Evaluations of Counterfactuals [88.42660013773647]
This paper consolidates work on evaluating visual counterfactual examples through an analysis and experiments.
We find that while most metrics behave as intended for sufficiently simple datasets, some fail to tell the difference between good and bad counterfactuals when the complexity increases.
We propose two new metrics, the Label Variation Score and the Oracle score, which are both less vulnerable to such tiny changes.
arXiv Detail & Related papers (2021-10-30T05:00:36Z) - CLEVA-Compass: A Continual Learning EValuation Assessment Compass to
Promote Research Transparency and Comparability [15.342039156426843]
We argue that the goal of a precise formulation of desiderata is an ill-posed one, as diverse applications may always warrant distinct scenarios.
In addition to promoting compact specification in the spirit of recent replication trends, the CLEVA- compass provides an intuitive chart to understand the priorities of individual systems.
arXiv Detail & Related papers (2021-10-07T10:53:26Z) - CURI: A Benchmark for Productive Concept Learning Under Uncertainty [33.83721664338612]
We introduce a new few-shot, meta-learning benchmark, Compositional Reasoning Under Uncertainty (CURI)
CURI evaluates different aspects of productive and systematic generalization, including abstract understandings of disentangling, productive generalization, learning operations, variable binding, etc.
It also defines a model-independent "compositionality gap" to evaluate the difficulty of generalizing out-of-distribution along each of these axes.
arXiv Detail & Related papers (2020-10-06T16:23:17Z) - Rethinking Class Relations: Absolute-relative Supervised and
Unsupervised Few-shot Learning [157.62595449130973]
We study the fundamental problem of simplistic class modeling in current few-shot learning methods.
We propose a novel Absolute-relative Learning paradigm to fully take advantage of label information to refine the image representations.
arXiv Detail & Related papers (2020-01-12T12:25:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.