Balancing Geometry and Density: Path Distances on High-Dimensional Data
- URL: http://arxiv.org/abs/2012.09385v1
- Date: Thu, 17 Dec 2020 04:03:15 GMT
- Title: Balancing Geometry and Density: Path Distances on High-Dimensional Data
- Authors: Anna Little, Daniel McKenzie and James Murphy
- Abstract summary: New geometric and computational analyses of power-weighted shortest-path distances (PWSPDs) are presented.
By illuminating the way these metrics balance density and geometry in the underlying data, we discuss how they may be chosen in practice.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: New geometric and computational analyses of power-weighted shortest-path
distances (PWSPDs) are presented. By illuminating the way these metrics balance
density and geometry in the underlying data, we clarify their key parameters
and discuss how they may be chosen in practice. Comparisons are made with
related data-driven metrics, which illustrate the broader role of density in
kernel-based unsupervised and semi-supervised machine learning.
Computationally, we relate PWSPDs on complete weighted graphs to their
analogues on weighted nearest neighbor graphs, providing high probability
guarantees on their equivalence that are near-optimal. Connections with
percolation theory are developed to establish estimates on the bias and
variance of PWSPDs in the finite sample setting. The theoretical results are
bolstered by illustrative experiments, demonstrating the versatility of PWSPDs
for a wide range of data settings. Throughout the paper, our results require
only that the underlying data is sampled from a low-dimensional manifold, and
depend crucially on the intrinsic dimension of this manifold, rather than its
ambient dimension.
Related papers
- A Class of Topological Pseudodistances for Fast Comparison of
Persistence Diagrams [0.3968603035422276]
We introduce a class of pseudodistances called Extended Topological Pseudodistances (ETD)
ETDs have tunable complexity, and can approximate Sliced and classical Wasserstein distances at the high-complexity extreme.
We experimentally verify that ETDs outperform PSs in terms of accuracy and outperform Wasserstein and Sliced Wasserstein distances in terms of computational complexity.
arXiv Detail & Related papers (2024-02-22T12:27:35Z) - Minimizing robust density power-based divergences for general parametric
density models [3.0277213703725767]
We introduce an approach to minimize Density power divergence (DPD) for general parametric densities.
The proposed approach can also be employed to minimize other density power-based $gamma$-divergences.
arXiv Detail & Related papers (2023-07-11T13:33:47Z) - Proximal Symmetric Non-negative Latent Factor Analysis: A Novel Approach
to Highly-Accurate Representation of Undirected Weighted Networks [2.1797442801107056]
Undirected Weighted Network (UWN) is commonly found in big data-related applications.
Existing models fail in either modeling its intrinsic symmetry or low-data density.
Proximal Symmetric Nonnegative Latent-factor-analysis model is proposed.
arXiv Detail & Related papers (2023-06-06T13:03:24Z) - Solving High-Dimensional PDEs with Latent Spectral Models [74.1011309005488]
We present Latent Spectral Models (LSM) toward an efficient and precise solver for high-dimensional PDEs.
Inspired by classical spectral methods in numerical analysis, we design a neural spectral block to solve PDEs in the latent space.
LSM achieves consistent state-of-the-art and yields a relative gain of 11.5% averaged on seven benchmarks.
arXiv Detail & Related papers (2023-01-30T04:58:40Z) - Data-heterogeneity-aware Mixing for Decentralized Learning [63.83913592085953]
We characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes.
We propose a metric that quantifies the ability of a graph to mix the current gradients.
Motivated by our analysis, we propose an approach that periodically and efficiently optimize the metric.
arXiv Detail & Related papers (2022-04-13T15:54:35Z) - Eikonal depth: an optimal control approach to statistical depths [0.7614628596146599]
We propose a new type of globally defined statistical depth, based upon control theory and eikonal equations.
This depth is easy to interpret and compute, expressively captures multi-modal behavior, and extends naturally to data that is non-Euclidean.
arXiv Detail & Related papers (2022-01-14T01:57:48Z) - Post-mortem on a deep learning contest: a Simpson's paradox and the
complementary roles of scale metrics versus shape metrics [61.49826776409194]
We analyze a corpus of models made publicly-available for a contest to predict the generalization accuracy of neural network (NN) models.
We identify what amounts to a Simpson's paradox: where "scale" metrics perform well overall but perform poorly on sub partitions of the data.
We present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs.
arXiv Detail & Related papers (2021-06-01T19:19:49Z) - Estimation and Quantization of Expected Persistence Diagrams [0.0]
We study two such summaries, the Persistence Diagram (EPD) and its quantization.
EPD is a measure supported on R 2.
We prove that this estimator is optimal from a minimax standpoint on a large class of models with a parametric rate of convergence.
arXiv Detail & Related papers (2021-05-11T08:12:18Z) - GELATO: Geometrically Enriched Latent Model for Offline Reinforcement
Learning [54.291331971813364]
offline reinforcement learning approaches can be divided into proximal and uncertainty-aware methods.
In this work, we demonstrate the benefit of combining the two in a latent variational model.
Our proposed metrics measure both the quality of out of distribution samples as well as the discrepancy of examples in the data.
arXiv Detail & Related papers (2021-02-22T19:42:40Z) - Effective Data-aware Covariance Estimator from Compressed Data [63.16042585506435]
We propose a data-aware weighted sampling based covariance matrix estimator, namely DACE, which can provide an unbiased covariance matrix estimation.
We conduct extensive experiments on both synthetic and real-world datasets to demonstrate the superior performance of our DACE.
arXiv Detail & Related papers (2020-10-10T10:10:28Z) - On Projection Robust Optimal Transport: Sample Complexity and Model
Misspecification [101.0377583883137]
Projection robust (PR) OT seeks to maximize the OT cost between two measures by choosing a $k$-dimensional subspace onto which they can be projected.
Our first contribution is to establish several fundamental statistical properties of PR Wasserstein distances.
Next, we propose the integral PR Wasserstein (IPRW) distance as an alternative to the PRW distance, by averaging rather than optimizing on subspaces.
arXiv Detail & Related papers (2020-06-22T14:35:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.