Revisiting Cosine Similarity via Normalized ICA-transformed Embeddings
- URL: http://arxiv.org/abs/2406.10984v3
- Date: Tue, 17 Dec 2024 08:03:38 GMT
- Title: Revisiting Cosine Similarity via Normalized ICA-transformed Embeddings
- Authors: Hiroaki Yamagiwa, Momose Oyama, Hidetoshi Shimodaira,
- Abstract summary: Cosine similarity is widely used to measure the similarity between two embeddings.<n>We propose a novel interpretation of cosine similarity as the sum of semantic similarities over axes.
- Score: 2.8402080392117757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cosine similarity is widely used to measure the similarity between two embeddings, while interpretations based on angle and correlation coefficient are common. In this study, we focus on the interpretable axes of embeddings transformed by Independent Component Analysis (ICA), and propose a novel interpretation of cosine similarity as the sum of semantic similarities over axes. The normalized ICA-transformed embeddings exhibit sparsity, enhancing the interpretability of each axis, and the semantic similarity defined by the product of the components represents the shared meaning between the two embeddings along each axis. The effectiveness of this approach is demonstrated through intuitive numerical examples and thorough numerical experiments. By deriving the probability distributions that govern each component and the product of components, we propose a method for selecting statistically significant axes.
Related papers
- Is Cosine-Similarity of Embeddings Really About Similarity? [46.75365717794515]
Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations.
We study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights.
We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless similarities'
arXiv Detail & Related papers (2024-03-08T16:48:20Z) - Nonparametric Partial Disentanglement via Mechanism Sparsity: Sparse
Actions, Interventions and Sparse Temporal Dependencies [58.179981892921056]
This work introduces a novel principle for disentanglement we call mechanism sparsity regularization.
We propose a representation learning method that induces disentanglement by simultaneously learning the latent factors.
We show that the latent factors can be recovered by regularizing the learned causal graph to be sparse.
arXiv Detail & Related papers (2024-01-10T02:38:21Z) - Duality of Bures and Shape Distances with Implications for Comparing
Neural Representations [6.698235069945606]
A multitude of (dis)similarity measures between neural network representations have been proposed, resulting in a fragmented research landscape.
First, measures such as linear regression, canonical correlations analysis (CCA), and shape distances, all learn explicit mappings between neural units to quantify similarity.
Second, measures such as representational similarity analysis (RSA), centered kernel alignment (CKA), and normalized Bures similarity (NBS) all quantify similarity in summary statistics.
arXiv Detail & Related papers (2023-11-19T22:17:09Z) - Enriching Disentanglement: From Logical Definitions to Quantitative Metrics [59.12308034729482]
Disentangling the explanatory factors in complex data is a promising approach for data-efficient representation learning.
We establish relationships between logical definitions and quantitative metrics to derive theoretically grounded disentanglement metrics.
We empirically demonstrate the effectiveness of the proposed metrics by isolating different aspects of disentangled representations.
arXiv Detail & Related papers (2023-05-19T08:22:23Z) - On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data.
Invariance measures consistency of model predictions on transformations of the data.
From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z) - Relating an entanglement measure with statistical correlators for
two-qudit mixed states using only a pair of complementary observables [0.0]
We focus on characterizing entanglement of high dimensional bipartite states using various statistical correlators for two-qudit mixed states.
relations linking Negativity with the statistical correlators have been derived for such Horodecki states in the domain of distillable entanglement.
arXiv Detail & Related papers (2022-01-17T02:58:36Z) - Image Synthesis via Semantic Composition [74.68191130898805]
We present a novel approach to synthesize realistic images based on their semantic layouts.
It hypothesizes that for objects with similar appearance, they share similar representation.
Our method establishes dependencies between regions according to their appearance correlation, yielding both spatially variant and associated representations.
arXiv Detail & Related papers (2021-09-15T02:26:07Z) - Disentanglement Analysis with Partial Information Decomposition [31.56299813238937]
disentangled representations aim at reversing the process by mapping data to multiple random variables that individually capture distinct generative factors.
Current disentanglement metrics are designed to measure the concentration, e.g., absolute deviation, variance, or entropy, of each variable conditioned by each generative factor.
In this work, we use the Partial Information Decomposition framework to evaluate information sharing between more than two variables, and build a framework, including a new disentanglement metric.
arXiv Detail & Related papers (2021-08-31T11:09:40Z) - Eigen Analysis of Self-Attention and its Reconstruction from Partial
Computation [58.80806716024701]
We study the global structure of attention scores computed using dot-product based self-attention.
We find that most of the variation among attention scores lie in a low-dimensional eigenspace.
We propose to compute scores only for a partial subset of token pairs, and use them to estimate scores for the remaining pairs.
arXiv Detail & Related papers (2021-06-16T14:38:42Z) - Learning Disentangled Representations with Latent Variation
Predictability [102.4163768995288]
This paper defines the variation predictability of latent disentangled representations.
Within an adversarial generation process, we encourage variation predictability by maximizing the mutual information between latent variations and corresponding image pairs.
We develop an evaluation metric that does not rely on the ground-truth generative factors to measure the disentanglement of latent representations.
arXiv Detail & Related papers (2020-07-25T08:54:26Z) - Bias-corrected estimator for intrinsic dimension and differential
entropy--a visual multiscale approach [0.0]
Intrinsic and differential entropy estimators are studied in this paper, including their systematic bias.
A pragmatic approach for joint estimation and bias correction of these two fundamental measures is proposed.
It is shown that both estimators can be complementary parts of a single approach, and that the simultaneous estimation of differential entropy and intrinsic dimension give meaning to each other.
arXiv Detail & Related papers (2020-04-30T00:29:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.