Related papers: Surpassing Cosine Similarity for Multidimensional Comparisons: Dimension Insensitive Euclidean Metric

Surpassing Cosine Similarity for Multidimensional Comparisons: Dimension Insensitive Euclidean Metric

URL: http://arxiv.org/abs/2407.08623v4
Date: Mon, 10 Mar 2025 16:17:30 GMT
Title: Surpassing Cosine Similarity for Multidimensional Comparisons: Dimension Insensitive Euclidean Metric
Authors: Federico Tessari, Kunpeng Yao, Neville Hogan,
Abstract summary: We introduce a Dimension Insensitive Euclidean Metric (DIEM) which demonstrates superior robustness and generalizability across dimensions.<n>DIEM maintains consistent variability and eliminates the biases observed in traditional metrics, making it a reliable tool for high-dimensional comparisons.<n>This novel metric has the potential to replace cosine similarity, providing a more accurate and insightful method to analyze multidimensional data in fields ranging from neuromotor control to machine learning.
Score: 4.415977307120617
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Advances in computational power and hardware efficiency have enabled tackling increasingly complex, high-dimensional problems. While artificial intelligence (AI) achieves remarkable results, the interpretability of high-dimensional solutions remains challenging. A critical issue is the comparison of multidimensional quantities, essential in techniques like Principal Component Analysis. Metrics such as cosine similarity are often used, for example in the development of natural language processing algorithms or recommender systems. However, the interpretability of such metrics diminishes as dimensions increase. This paper analyzes the effects of dimensionality, revealing significant limitations of cosine similarity, particularly its dependency on the dimension of vectors, leading to biased and poorly interpretable outcomes. To address this, we introduce a Dimension Insensitive Euclidean Metric (DIEM) which demonstrates superior robustness and generalizability across dimensions. DIEM maintains consistent variability and eliminates the biases observed in traditional metrics, making it a reliable tool for high-dimensional comparisons. An example of the advantages of DIEM over cosine similarity is reported for a large language model application. This novel metric has the potential to replace cosine similarity, providing a more accurate and insightful method to analyze multidimensional data in fields ranging from neuromotor control to machine learning.

Related papers

Geometry-Informed Neural Operator Transformer [0.8906214436849201]
This work introduces the Geometry-Informed Neural Operator Transformer (GINOT), which integrates the transformer architecture with the neural operator framework to enable forward predictions for arbitrary geometries. The performance of GINOT is validated on multiple challenging datasets, showcasing its high accuracy and strong generalization capabilities for complex and arbitrary 2D and 3D geometries.
arXiv Detail & Related papers (2025-04-28T03:39:27Z)
PCA-RAG: Principal Component Analysis for Efficient Retrieval-Augmented Generation [0.0]
High-dimensional language model embeddings can present scalability challenges in terms of storage and latency. This paper investigates the use of Principal Component Analysis (PCA) to reduce embedding dimensionality. We show that PCA-based compression offers a viable balance between retrieval fidelity and resource efficiency.
arXiv Detail & Related papers (2025-04-11T09:38:12Z)
A Novel Approach for Intrinsic Dimension Estimation [0.0]
The real-life data have a complex and non-linear structure due to their nature. Finding the nearly optimal representation of the dataset in a lower-dimensional space offers an applicable mechanism for improving the success of machine learning tasks. We propose a highly efficient and robust intrinsic dimension estimation approach.
arXiv Detail & Related papers (2025-03-12T15:42:39Z)
Evaluating Representational Similarity Measures from the Lens of Functional Correspondence [1.7811840395202345]
Neuroscience and artificial intelligence (AI) both face the challenge of interpreting high-dimensional neural data. Despite the widespread use of representational comparisons, a critical question remains: which metrics are most suitable for these comparisons?
arXiv Detail & Related papers (2024-11-21T23:53:58Z)
Complexity Matters: Effective Dimensionality as a Measure for Adversarial Robustness [0.7366405857677227]
In this work, we investigate the relationship between a model's effective dimensionality and its robustness properties. We run experiments on commercial-scale models that are often used in real-world environments such as YOLO and ResNet. We reveal a near-linear inverse relationship between effective dimensionality and adversarial robustness, that is models with a lower dimensionality exhibit better robustness.
arXiv Detail & Related papers (2024-10-24T09:01:34Z)
SIMformer: Single-Layer Vanilla Transformer Can Learn Free-Space Trajectory Similarity [11.354974227479355]
We propose a simple, yet accurate, fast, scalable model that only uses a single-layer vanilla transformer encoder as the feature extractor. Our model significantly mitigates the curse of dimensionality issue and outperforms the state-of-the-arts in effectiveness, efficiency, and scalability.
arXiv Detail & Related papers (2024-10-18T17:30:17Z)
Separable DeepONet: Breaking the Curse of Dimensionality in Physics-Informed Machine Learning [0.0]
In the absence of labeled datasets, we utilize the PDE residual loss to learn the physical system, an approach known as physics-informed DeepONet. This method faces significant computational challenges, primarily due to the curse of dimensionality, as the computational cost increases exponentially with finer discretization. We introduce the Separable DeepONet framework to address these challenges and improve scalability for high-dimensional PDEs.
arXiv Detail & Related papers (2024-07-21T16:33:56Z)
Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection [133.66006666465447]
Current metrics are size-sensitive, where larger objects are focused, and smaller ones tend to be ignored. We argue that the evaluation should be size-invariant because bias based on size is unjustified without additional semantic information. We develop an optimization framework tailored to this goal, achieving considerable improvements in detecting objects of different sizes.
arXiv Detail & Related papers (2024-05-16T03:01:06Z)
Interpreting the Curse of Dimensionality from Distance Concentration and Manifold Effect [0.6906005491572401]
We first summarize five challenges associated with manipulating high-dimensional data. We then delve into two major causes of the curse of dimensionality, distance concentration and manifold effect. By interpreting the causes of the curse of dimensionality, we can better understand the limitations of current models and algorithms.
arXiv Detail & Related papers (2023-12-31T08:22:51Z)
A Geometrical Approach to Evaluate the Adversarial Robustness of Deep Neural Networks [52.09243852066406]
Adversarial Converging Time Score (ACTS) measures the converging time as an adversarial robustness metric. We validate the effectiveness and generalization of the proposed ACTS metric against different adversarial attacks on the large-scale ImageNet dataset.
arXiv Detail & Related papers (2023-10-10T09:39:38Z)
Simultaneous Dimensionality Reduction: A Data Efficient Approach for Multimodal Representations Learning [0.0]
We explore two primary classes of approaches to dimensionality reduction (DR): Independent Dimensionality Reduction (IDR) and Simultaneous Dimensionality Reduction (SDR) In IDR, each modality is compressed independently, striving to retain as much variation within each modality as possible. In SDR, one simultaneously compresses the modalities to maximize the covariation between the reduced descriptions while paying less attention to how much individual variation is preserved.
arXiv Detail & Related papers (2023-10-05T04:26:24Z)
An evaluation framework for dimensionality reduction through sectional curvature [59.40521061783166]
In this work, we aim to introduce the first highly non-supervised dimensionality reduction performance metric. To test its feasibility, this metric has been used to evaluate the performance of the most commonly used dimension reduction algorithms. A new parameterized problem instance generator has been constructed in the form of a function generator.
arXiv Detail & Related papers (2023-03-17T11:59:33Z)
An Experimental Study of Dimension Reduction Methods on Machine Learning Algorithms with Applications to Psychometrics [77.34726150561087]
We show that dimension reduction can decrease, increase, or provide the same accuracy as no reduction of variables. Our tentative results find that dimension reduction tends to lead to better performance when used for classification tasks.
arXiv Detail & Related papers (2022-10-19T22:07:13Z)
Exploring Dimensionality Reduction Techniques in Multilingual Transformers [64.78260098263489]
This paper gives a comprehensive account of the impact of dimensional reduction techniques on the performance of state-of-the-art multilingual Siamese Transformers. It shows that it is possible to achieve an average reduction in the number of dimensions of $91.58% pm 2.59%$ and $54.65% pm 32.20%$, respectively.
arXiv Detail & Related papers (2022-04-18T17:20:55Z)
Adaptive Hierarchical Similarity Metric Learning with Noisy Labels [138.41576366096137]
We propose an Adaptive Hierarchical Similarity Metric Learning method. It considers two noise-insensitive information, textiti.e., class-wise divergence and sample-wise consistency. Our method achieves state-of-the-art performance compared with current deep metric learning approaches.
arXiv Detail & Related papers (2021-10-29T02:12:18Z)
Effective Data-aware Covariance Estimator from Compressed Data [63.16042585506435]
We propose a data-aware weighted sampling based covariance matrix estimator, namely DACE, which can provide an unbiased covariance matrix estimation. We conduct extensive experiments on both synthetic and real-world datasets to demonstrate the superior performance of our DACE.
arXiv Detail & Related papers (2020-10-10T10:10:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.