A metric on directed graphs and Markov chains based on hitting
probabilities
- URL: http://arxiv.org/abs/2006.14482v2
- Date: Mon, 18 Jan 2021 16:54:57 GMT
- Title: A metric on directed graphs and Markov chains based on hitting
probabilities
- Authors: Zachary M. Boyd, Nicolas Fraiman, Jeremy L. Marzuola, Peter J. Mucha,
Braxton Osting, and Jonathan Weare
- Abstract summary: We introduce a metric on the state space of any ergodic, finite-state, time-homogeneous Markov chain.
Our construction is based on hitting probabilities, with nearness in the metric space related to the transfer of random walkers from one node to another at stationarity.
Notably, our metric is insensitive to shortest and average walk distances, thus giving new information compared to existing metrics.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The shortest-path, commute time, and diffusion distances on undirected graphs
have been widely employed in applications such as dimensionality reduction,
link prediction, and trip planning. Increasingly, there is interest in using
asymmetric structure of data derived from Markov chains and directed graphs,
but few metrics are specifically adapted to this task. We introduce a metric on
the state space of any ergodic, finite-state, time-homogeneous Markov chain
and, in particular, on any Markov chain derived from a directed graph. Our
construction is based on hitting probabilities, with nearness in the metric
space related to the transfer of random walkers from one node to another at
stationarity. Notably, our metric is insensitive to shortest and average walk
distances, thus giving new information compared to existing metrics. We use
possible degeneracies in the metric to develop an interesting structural theory
of directed graphs and explore a related quotienting procedure. Our metric can
be computed in $O(n^3)$ time, where $n$ is the number of states, and in
examples we scale up to $n=10,000$ nodes and $\approx 38M$ edges on a desktop
computer. In several examples, we explore the nature of the metric, compare it
to alternative methods, and demonstrate its utility for weak recovery of
community structure in dense graphs, visualization, structure recovering,
dynamics exploration, and multiscale cluster detection.
Related papers
- Just Wing It: Near-Optimal Estimation of Missing Mass in a Markovian Sequence [13.552044856329648]
We develop a linear-runtime estimator called Windowed Good--Turing (WingIt)
We show that its risk decays as $widetildeO(mathsfT_mix/n)$, where $mathsfT_mix$ denotes the mixing time of the chain in total variation distance.
We extend our estimator to approximate the stationary mass placed on elements occurring with small frequency in the trajectory.
arXiv Detail & Related papers (2024-04-08T18:55:07Z) - Information-Theoretic Thresholds for Planted Dense Cycles [52.076657911275525]
We study a random graph model for small-world networks which are ubiquitous in social and biological sciences.
For both detection and recovery of the planted dense cycle, we characterize the information-theoretic thresholds in terms of $n$, $tau$, and an edge-wise signal-to-noise ratio $lambda$.
arXiv Detail & Related papers (2024-02-01T03:39:01Z) - Improving embedding of graphs with missing data by soft manifolds [51.425411400683565]
The reliability of graph embeddings depends on how much the geometry of the continuous space matches the graph structure.
We introduce a new class of manifold, named soft manifold, that can solve this situation.
Using soft manifold for graph embedding, we can provide continuous spaces to pursue any task in data analysis over complex datasets.
arXiv Detail & Related papers (2023-11-29T12:48:33Z) - Graph GOSPA metric: a metric to measure the discrepancy between graphs of different sizes [3.8823562292981393]
This paper proposes a metric to measure the dissimilarity between graphs that may have a different number of nodes.
The proposed graph GOSPA metric includes costs associated with node attribute errors for properly assigned nodes, missed and false nodes and edge mismatches between graphs.
arXiv Detail & Related papers (2023-11-10T11:40:24Z) - Graph Fourier MMD for Signals on Graphs [67.68356461123219]
We propose a novel distance between distributions and signals on graphs.
GFMMD is defined via an optimal witness function that is both smooth on the graph and maximizes difference in expectation.
We showcase it on graph benchmark datasets as well as on single cell RNA-sequencing data analysis.
arXiv Detail & Related papers (2023-06-05T00:01:17Z) - Tight and fast generalization error bound of graph embedding in metric
space [54.279425319381374]
We show that graph embedding in non-Euclidean metric space can outperform that in Euclidean space with much smaller training data than the existing bound has suggested.
Our new upper bound is significantly tighter and faster than the existing one, which can be exponential to $R$ and $O(frac1S)$ at the fastest.
arXiv Detail & Related papers (2023-05-13T17:29:18Z) - Identifying latent distances with Finslerian geometry [6.0188611984807245]
Generative models cause the data space and the geodesics to be at best impractical, and at worst impossible to manipulate.
In this work, we propose another metric whose geodesics explicitly minimise the expected length of the pullback metric.
In high dimensions, we prove that both metrics converge to each other at a rate of $Oleft(frac1Dright)$.
arXiv Detail & Related papers (2022-12-20T05:57:27Z) - Entropic Optimal Transport in Random Graphs [8.7314407902481]
In graph analysis, a classic task consists in computing similarity measures between (groups of) nodes.
We show that it is possible to consistently estimate distances between groups of nodes in the latent space.
arXiv Detail & Related papers (2022-01-11T13:52:34Z) - Embedding Signals on Knowledge Graphs with Unbalanced Diffusion Earth
Mover's Distance [63.203951161394265]
In modern machine learning it is common to encounter large graphs that arise via interactions or similarities between observations in many domains.
We propose to compare and organize such datasets of graph signals by using an earth mover's distance (EMD) with a geodesic cost over the underlying graph.
In each case, we show that UDEMD-based embeddings find accurate distances that are highly efficient compared to other methods.
arXiv Detail & Related papers (2021-07-26T17:19:02Z) - Markov Random Geometric Graph (MRGG): A Growth Model for Temporal
Dynamic Networks [0.0]
We introduce Markov Random Geometric Graphs (MRGGs), a growth model for temporal dynamic networks.
We show how MRGGs can be used to detect dependence structure in growing graphs and to solve link prediction problems.
arXiv Detail & Related papers (2020-06-12T08:35:54Z) - Block-Approximated Exponential Random Graphs [77.4792558024487]
An important challenge in the field of exponential random graphs (ERGs) is the fitting of non-trivial ERGs on large graphs.
We propose an approximative framework to such non-trivial ERGs that result in dyadic independence (i.e., edge independent) distributions.
Our methods are scalable to sparse graphs consisting of millions of nodes.
arXiv Detail & Related papers (2020-02-14T11:42:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.