Surpassing Cosine Similarity for Multidimensional Comparisons: Dimension Insensitive Euclidean Metric (DIEM)
- URL: http://arxiv.org/abs/2407.08623v1
- Date: Thu, 11 Jul 2024 16:00:22 GMT
- Title: Surpassing Cosine Similarity for Multidimensional Comparisons: Dimension Insensitive Euclidean Metric (DIEM)
- Authors: Federico Tessari, Neville Hogan,
- Abstract summary: We introduce the Dimension Insensitive Euclidean Metric (DIEM), derived from the Euclidean distance.
DIEM maintains consistent variability and eliminates the biases observed in traditional metrics.
This novel metric has the potential to replace cosine similarity, providing a more accurate and insightful method to analyze multidimensional data.
- Score: 3.812115031347965
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The advancement in computational power and hardware efficiency has enabled the tackling of increasingly complex and high-dimensional problems. While artificial intelligence (AI) has achieved remarkable results in various scientific and technological fields, the interpretability of these high-dimensional solutions remains challenging. A critical issue in this context is the comparison of multidimensional quantities, which is essential in techniques like Principal Component Analysis (PCA), Singular Value Decomposition (SVD), and k-means clustering. Common metrics such as cosine similarity, Euclidean distance, and Manhattan distance are often used for such comparisons - for example in muscular synergies of the human motor control system. However, their applicability and interpretability diminish as dimensionality increases. This paper provides a comprehensive analysis of the effects of dimensionality on these three widely used metrics. Our results reveal significant limitations of cosine similarity, particularly its dependency on the dimensionality of the vectors, leading to biased and less interpretable outcomes. To address this, we introduce the Dimension Insensitive Euclidean Metric (DIEM), derived from the Euclidean distance, which demonstrates superior robustness and generalizability across varying dimensions. DIEM maintains consistent variability and eliminates the biases observed in traditional metrics, making it a more reliable tool for high-dimensional comparisons. This novel metric has the potential to replace cosine similarity, providing a more accurate and insightful method to analyze multidimensional data in fields ranging from neuromotor control to machine learning and deep learning.
Related papers
- Separable DeepONet: Breaking the Curse of Dimensionality in Physics-Informed Machine Learning [0.0]
In the absence of labeled datasets, we utilize the PDE residual loss to learn the physical system, an approach known as physics-informed DeepONet.
This method faces significant computational challenges, primarily due to the curse of dimensionality, as the computational cost increases exponentially with finer discretization.
We introduce the Separable DeepONet framework to address these challenges and improve scalability for high-dimensional PDEs.
arXiv Detail & Related papers (2024-07-21T16:33:56Z) - Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection [133.66006666465447]
Current metrics are size-sensitive, where larger objects are focused, and smaller ones tend to be ignored.
We argue that the evaluation should be size-invariant because bias based on size is unjustified without additional semantic information.
We develop an optimization framework tailored to this goal, achieving considerable improvements in detecting objects of different sizes.
arXiv Detail & Related papers (2024-05-16T03:01:06Z) - Interpreting the Curse of Dimensionality from Distance Concentration and
Manifold Effect [0.6906005491572401]
We first summarize five challenges associated with manipulating high-dimensional data.
We then delve into two major causes of the curse of dimensionality, distance concentration and manifold effect.
By interpreting the causes of the curse of dimensionality, we can better understand the limitations of current models and algorithms.
arXiv Detail & Related papers (2023-12-31T08:22:51Z) - An evaluation framework for dimensionality reduction through sectional
curvature [59.40521061783166]
In this work, we aim to introduce the first highly non-supervised dimensionality reduction performance metric.
To test its feasibility, this metric has been used to evaluate the performance of the most commonly used dimension reduction algorithms.
A new parameterized problem instance generator has been constructed in the form of a function generator.
arXiv Detail & Related papers (2023-03-17T11:59:33Z) - An Experimental Study of Dimension Reduction Methods on Machine Learning
Algorithms with Applications to Psychometrics [77.34726150561087]
We show that dimension reduction can decrease, increase, or provide the same accuracy as no reduction of variables.
Our tentative results find that dimension reduction tends to lead to better performance when used for classification tasks.
arXiv Detail & Related papers (2022-10-19T22:07:13Z) - Synergy and Symmetry in Deep Learning: Interactions between the Data,
Model, and Inference Algorithm [33.59320315666675]
We study the triplet (D, M, I) as an integrated system and identify important synergies that help mitigate the curse of dimensionality.
We find that learning is most efficient when these symmetries are compatible with those of the data distribution.
arXiv Detail & Related papers (2022-07-11T04:08:21Z) - Exploring Dimensionality Reduction Techniques in Multilingual
Transformers [64.78260098263489]
This paper gives a comprehensive account of the impact of dimensional reduction techniques on the performance of state-of-the-art multilingual Siamese Transformers.
It shows that it is possible to achieve an average reduction in the number of dimensions of $91.58% pm 2.59%$ and $54.65% pm 32.20%$, respectively.
arXiv Detail & Related papers (2022-04-18T17:20:55Z) - Effective Data-aware Covariance Estimator from Compressed Data [63.16042585506435]
We propose a data-aware weighted sampling based covariance matrix estimator, namely DACE, which can provide an unbiased covariance matrix estimation.
We conduct extensive experiments on both synthetic and real-world datasets to demonstrate the superior performance of our DACE.
arXiv Detail & Related papers (2020-10-10T10:10:28Z) - Information Theory Measures via Multidimensional Gaussianization [7.788961560607993]
Information theory is an outstanding framework to measure uncertainty, dependence and relevance in data and systems.
It has several desirable properties for real world applications.
However, obtaining information from multidimensional data is a challenging problem due to the curse of dimensionality.
arXiv Detail & Related papers (2020-10-08T07:22:16Z) - Comparison of Distal Teacher Learning with Numerical and Analytical
Methods to Solve Inverse Kinematics for Rigid-Body Mechanisms [67.80123919697971]
We argue that one of the first proposed machine learning (ML) solutions to inverse kinematics -- distal teaching (DT) -- is actually good enough when combined with differentiable programming libraries.
We analyze solve rate, accuracy, sample efficiency and scalability.
With enough training data and relaxed precision requirements, DT has a better solve rate and is faster than state-of-the-art numerical solvers for a 15-DoF mechanism.
arXiv Detail & Related papers (2020-02-29T09:55:45Z) - Learning Flat Latent Manifolds with VAEs [16.725880610265378]
We propose an extension to the framework of variational auto-encoders, where the Euclidean metric is a proxy for the similarity between data points.
We replace the compact prior typically used in variational auto-encoders with a recently presented, more expressive hierarchical one.
We evaluate our method on a range of data-sets, including a video-tracking benchmark.
arXiv Detail & Related papers (2020-02-12T09:54:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.