Related papers: Surpassing Cosine Similarity for Multidimensional Comparisons: Dimension Insensitive Euclidean Metric (DIEM)

Surpassing Cosine Similarity for Multidimensional Comparisons: Dimension Insensitive Euclidean Metric (DIEM)

URL: http://arxiv.org/abs/2407.08623v1
Date: Thu, 11 Jul 2024 16:00:22 GMT
Title: Surpassing Cosine Similarity for Multidimensional Comparisons: Dimension Insensitive Euclidean Metric (DIEM)
Authors: Federico Tessari, Neville Hogan,
Abstract summary: We introduce the Dimension Insensitive Euclidean Metric (DIEM), derived from the Euclidean distance. DIEM maintains consistent variability and eliminates the biases observed in traditional metrics. This novel metric has the potential to replace cosine similarity, providing a more accurate and insightful method to analyze multidimensional data.
Score: 3.812115031347965
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The advancement in computational power and hardware efficiency has enabled the tackling of increasingly complex and high-dimensional problems. While artificial intelligence (AI) has achieved remarkable results in various scientific and technological fields, the interpretability of these high-dimensional solutions remains challenging. A critical issue in this context is the comparison of multidimensional quantities, which is essential in techniques like Principal Component Analysis (PCA), Singular Value Decomposition (SVD), and k-means clustering. Common metrics such as cosine similarity, Euclidean distance, and Manhattan distance are often used for such comparisons - for example in muscular synergies of the human motor control system. However, their applicability and interpretability diminish as dimensionality increases. This paper provides a comprehensive analysis of the effects of dimensionality on these three widely used metrics. Our results reveal significant limitations of cosine similarity, particularly its dependency on the dimensionality of the vectors, leading to biased and less interpretable outcomes. To address this, we introduce the Dimension Insensitive Euclidean Metric (DIEM), derived from the Euclidean distance, which demonstrates superior robustness and generalizability across varying dimensions. DIEM maintains consistent variability and eliminates the biases observed in traditional metrics, making it a more reliable tool for high-dimensional comparisons. This novel metric has the potential to replace cosine similarity, providing a more accurate and insightful method to analyze multidimensional data in fields ranging from neuromotor control to machine learning and deep learning.

Related papers

Geometry-Informed Neural Operator Transformer [0.8906214436849201]
This work introduces the Geometry-Informed Neural Operator Transformer (GINOT), which integrates the transformer architecture with the neural operator framework to enable forward predictions for arbitrary geometries. The performance of GINOT is validated on multiple challenging datasets, showcasing its high accuracy and strong generalization capabilities for complex and arbitrary 2D and 3D geometries.
arXiv Detail & Related papers (2025-04-28T03:39:27Z)
PCA-RAG: Principal Component Analysis for Efficient Retrieval-Augmented Generation [0.0]
High-dimensional language model embeddings can present scalability challenges in terms of storage and latency. This paper investigates the use of Principal Component Analysis (PCA) to reduce embedding dimensionality. We show that PCA-based compression offers a viable balance between retrieval fidelity and resource efficiency.
arXiv Detail & Related papers (2025-04-11T09:38:12Z)
A Novel Approach for Intrinsic Dimension Estimation [0.0]
The real-life data have a complex and non-linear structure due to their nature. Finding the nearly optimal representation of the dataset in a lower-dimensional space offers an applicable mechanism for improving the success of machine learning tasks. We propose a highly efficient and robust intrinsic dimension estimation approach.
arXiv Detail & Related papers (2025-03-12T15:42:39Z)
Evaluating Representational Similarity Measures from the Lens of Functional Correspondence [1.7811840395202345]
Neuroscience and artificial intelligence (AI) both face the challenge of interpreting high-dimensional neural data. Despite the widespread use of representational comparisons, a critical question remains: which metrics are most suitable for these comparisons?
arXiv Detail & Related papers (2024-11-21T23:53:58Z)
Complexity Matters: Effective Dimensionality as a Measure for Adversarial Robustness [0.7366405857677227]
In this work, we investigate the relationship between a model's effective dimensionality and its robustness properties. We run experiments on commercial-scale models that are often used in real-world environments such as YOLO and ResNet. We reveal a near-linear inverse relationship between effective dimensionality and adversarial robustness, that is models with a lower dimensionality exhibit better robustness.
arXiv Detail & Related papers (2024-10-24T09:01:34Z)
SIMformer: Single-Layer Vanilla Transformer Can Learn Free-Space Trajectory Similarity [11.354974227479355]
We propose a simple, yet accurate, fast, scalable model that only uses a single-layer vanilla transformer encoder as the feature extractor. Our model significantly mitigates the curse of dimensionality issue and outperforms the state-of-the-arts in effectiveness, efficiency, and scalability.
arXiv Detail & Related papers (2024-10-18T17:30:17Z)
Separable DeepONet: Breaking the Curse of Dimensionality in Physics-Informed Machine Learning [0.0]
In the absence of labeled datasets, we utilize the PDE residual loss to learn the physical system, an approach known as physics-informed DeepONet. This method faces significant computational challenges, primarily due to the curse of dimensionality, as the computational cost increases exponentially with finer discretization. We introduce the Separable DeepONet framework to address these challenges and improve scalability for high-dimensional PDEs.
arXiv Detail & Related papers (2024-07-21T16:33:56Z)
Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection [133.66006666465447]
Current metrics are size-sensitive, where larger objects are focused, and smaller ones tend to be ignored. We argue that the evaluation should be size-invariant because bias based on size is unjustified without additional semantic information. We develop an optimization framework tailored to this goal, achieving considerable improvements in detecting objects of different sizes.
arXiv Detail & Related papers (2024-05-16T03:01:06Z)
Interpreting the Curse of Dimensionality from Distance Concentration and Manifold Effect [0.6906005491572401]
We first summarize five challenges associated with manipulating high-dimensional data. We then delve into two major causes of the curse of dimensionality, distance concentration and manifold effect. By interpreting the causes of the curse of dimensionality, we can better understand the limitations of current models and algorithms.
arXiv Detail & Related papers (2023-12-31T08:22:51Z)
A Geometrical Approach to Evaluate the Adversarial Robustness of Deep Neural Networks [52.09243852066406]
Adversarial Converging Time Score (ACTS) measures the converging time as an adversarial robustness metric. We validate the effectiveness and generalization of the proposed ACTS metric against different adversarial attacks on the large-scale ImageNet dataset.
arXiv Detail & Related papers (2023-10-10T09:39:38Z)
Simultaneous Dimensionality Reduction: A Data Efficient Approach for Multimodal Representations Learning [0.0]
We explore two primary classes of approaches to dimensionality reduction (DR): Independent Dimensionality Reduction (IDR) and Simultaneous Dimensionality Reduction (SDR) In IDR, each modality is compressed independently, striving to retain as much variation within each modality as possible. In SDR, one simultaneously compresses the modalities to maximize the covariation between the reduced descriptions while paying less attention to how much individual variation is preserved.
arXiv Detail & Related papers (2023-10-05T04:26:24Z)
Comparison of Single- and Multi- Objective Optimization Quality for Evolutionary Equation Discovery [77.34726150561087]
Evolutionary differential equation discovery proved to be a tool to obtain equations with less a priori assumptions. The proposed comparison approach is shown on classical model examples -- Burgers equation, wave equation, and Korteweg - de Vries equation.
arXiv Detail & Related papers (2023-06-29T15:37:19Z)
An evaluation framework for dimensionality reduction through sectional curvature [59.40521061783166]
In this work, we aim to introduce the first highly non-supervised dimensionality reduction performance metric. To test its feasibility, this metric has been used to evaluate the performance of the most commonly used dimension reduction algorithms. A new parameterized problem instance generator has been constructed in the form of a function generator.
arXiv Detail & Related papers (2023-03-17T11:59:33Z)
An Experimental Study of Dimension Reduction Methods on Machine Learning Algorithms with Applications to Psychometrics [77.34726150561087]
We show that dimension reduction can decrease, increase, or provide the same accuracy as no reduction of variables. Our tentative results find that dimension reduction tends to lead to better performance when used for classification tasks.
arXiv Detail & Related papers (2022-10-19T22:07:13Z)
Exploring Dimensionality Reduction Techniques in Multilingual Transformers [64.78260098263489]
This paper gives a comprehensive account of the impact of dimensional reduction techniques on the performance of state-of-the-art multilingual Siamese Transformers. It shows that it is possible to achieve an average reduction in the number of dimensions of $91.58% pm 2.59%$ and $54.65% pm 32.20%$, respectively.
arXiv Detail & Related papers (2022-04-18T17:20:55Z)
Dimensionless machine learning: Imposing exact units equivariance [7.9926585627926166]
We provide a two stage learning procedure for units-equivariant machine learning. We first construct a dimensionless version of its inputs using classic results from dimensional analysis. We then perform inference in the dimensionless space.
arXiv Detail & Related papers (2022-04-02T15:46:20Z)
Adaptive Hierarchical Similarity Metric Learning with Noisy Labels [138.41576366096137]
We propose an Adaptive Hierarchical Similarity Metric Learning method. It considers two noise-insensitive information, textiti.e., class-wise divergence and sample-wise consistency. Our method achieves state-of-the-art performance compared with current deep metric learning approaches.
arXiv Detail & Related papers (2021-10-29T02:12:18Z)
Effective Data-aware Covariance Estimator from Compressed Data [63.16042585506435]
We propose a data-aware weighted sampling based covariance matrix estimator, namely DACE, which can provide an unbiased covariance matrix estimation. We conduct extensive experiments on both synthetic and real-world datasets to demonstrate the superior performance of our DACE.
arXiv Detail & Related papers (2020-10-10T10:10:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.