Log-Euclidean Signatures for Intrinsic Distances Between Unaligned
Datasets
- URL: http://arxiv.org/abs/2202.01671v1
- Date: Thu, 3 Feb 2022 16:37:23 GMT
- Title: Log-Euclidean Signatures for Intrinsic Distances Between Unaligned
Datasets
- Authors: Tal Shnitzer, Mikhail Yurochkin, Kristjan Greenewald and Justin
Solomon
- Abstract summary: We use manifold learning to compare the intrinsic geometric structures of different datasets.
We define a new theoretically-motivated distance based on a lower bound of the log-Euclidean metric.
- Score: 47.20862716252927
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The need for efficiently comparing and representing datasets with unknown
alignment spans various fields, from model analysis and comparison in machine
learning to trend discovery in collections of medical datasets. We use manifold
learning to compare the intrinsic geometric structures of different datasets by
comparing their diffusion operators, symmetric positive-definite (SPD) matrices
that relate to approximations of the continuous Laplace-Beltrami operator from
discrete samples. Existing methods typically compare such operators in a
pointwise manner or assume known data alignment. Instead, we exploit the
Riemannian geometry of SPD matrices to compare these operators and define a new
theoretically-motivated distance based on a lower bound of the log-Euclidean
metric. Our framework facilitates comparison of data manifolds expressed in
datasets with different sizes, numbers of features, and measurement modalities.
Our log-Euclidean signature (LES) distance recovers meaningful structural
differences, outperforming competing methods in various application domains.
Related papers
- Measuring similarity between embedding spaces using induced neighborhood graphs [10.056989400384772]
We propose a metric to evaluate the similarity between paired item representations.
Our results show that accuracy in both analogy and zero-shot classification tasks correlates with the embedding similarity.
arXiv Detail & Related papers (2024-11-13T15:22:33Z) - Symmetry Discovery for Different Data Types [52.2614860099811]
Equivariant neural networks incorporate symmetries into their architecture, achieving higher generalization performance.
We propose LieSD, a method for discovering symmetries via trained neural networks which approximate the input-output mappings of the tasks.
We validate the performance of LieSD on tasks with symmetries such as the two-body problem, the moment of inertia matrix prediction, and top quark tagging.
arXiv Detail & Related papers (2024-10-13T13:39:39Z) - Induced Covariance for Causal Discovery in Linear Sparse Structures [55.2480439325792]
Causal models seek to unravel the cause-effect relationships among variables from observed data.
This paper introduces a novel causal discovery algorithm designed for settings in which variables exhibit linearly sparse relationships.
arXiv Detail & Related papers (2024-10-02T04:01:38Z) - Entropic Optimal Transport Eigenmaps for Nonlinear Alignment and Joint Embedding of High-Dimensional Datasets [11.105392318582677]
We propose a principled approach for aligning and jointly embedding a pair of datasets with theoretical guarantees.
Our approach leverages the leading singular vectors of the EOT plan matrix between two datasets to extract their shared underlying structure.
We show that in a high-dimensional regime, the EOT plan recovers the shared manifold structure by approximating a kernel function evaluated at the locations of the latent variables.
arXiv Detail & Related papers (2024-07-01T18:48:55Z) - Kernel distance measures for time series, random fields and other
structured data [71.61147615789537]
kdiff is a novel kernel-based measure for estimating distances between instances of structured data.
It accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution.
Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems.
arXiv Detail & Related papers (2021-09-29T22:54:17Z) - A Differential Geometry Perspective on Orthogonal Recurrent Models [56.09491978954866]
We employ tools and insights from differential geometry to offer a novel perspective on orthogonal RNNs.
We show that orthogonal RNNs may be viewed as optimizing in the space of divergence-free vector fields.
Motivated by this observation, we study a new recurrent model, which spans the entire space of vector fields.
arXiv Detail & Related papers (2021-02-18T19:39:22Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z) - Learning Similarity Metrics for Numerical Simulations [29.39625644221578]
We propose a neural network-based approach that computes a stable and generalizing metric (LSiM) to compare data from a variety of numerical simulation sources.
Our method employs a Siamese network architecture that is motivated by the mathematical properties of a metric.
arXiv Detail & Related papers (2020-02-18T20:11:15Z) - Learning Flat Latent Manifolds with VAEs [16.725880610265378]
We propose an extension to the framework of variational auto-encoders, where the Euclidean metric is a proxy for the similarity between data points.
We replace the compact prior typically used in variational auto-encoders with a recently presented, more expressive hierarchical one.
We evaluate our method on a range of data-sets, including a video-tracking benchmark.
arXiv Detail & Related papers (2020-02-12T09:54:52Z) - Geometric Dataset Distances via Optimal Transport [15.153110906331733]
We propose an alternative notion of distance between datasets that (i) is model-agnostic, (ii) does not involve training, (iii) can compare datasets even if their label sets are completely disjoint and (iv) has solid theoretical footing.
This distance relies on optimal transport, which provides it with rich geometry awareness, interpretable correspondences and well-understood properties.
Our results show that this novel distance provides meaningful comparison of datasets, and correlates well with transfer learning hardness across various experimental settings and datasets.
arXiv Detail & Related papers (2020-02-07T17:51:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.