A distance function for stochastic matrices
- URL: http://arxiv.org/abs/2410.12689v2
- Date: Mon, 05 May 2025 12:21:46 GMT
- Title: A distance function for stochastic matrices
- Authors: Antony R. Lee, Peter Tino, Iain Bruce Styles,
- Abstract summary: Starting with sequences of Markov chains the Bhattacharyya angle is advocated as the natural tool for comparing both short and long term Markov chain runs.<n>It is shown that considering either the Bhattacharyya angle on Markov sequences or the new matrix distance leads to the same distance between models.
- Score: 0.9217021281095907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Motivated by information geometry, a distance function on the space of stochastic matrices is advocated. Starting with sequences of Markov chains the Bhattacharyya angle is advocated as the natural tool for comparing both short and long term Markov chain runs. Bounds on the convergence of the distance and mixing times are derived. Guided by the desire to compare different Markov chain models, especially in the setting of healthcare processes, a new distance function on the space of stochastic matrices is presented. It is a true distance measure which has a closed form and is efficient to implement for numerical evaluation. In the case of ergodic Markov chains, it is shown that considering either the Bhattacharyya angle on Markov sequences or the new stochastic matrix distance leads to the same distance between models.
Related papers
- Comparing Labeled Markov Chains: A Cantor-Kantorovich Approach [53.66196601631798]
We study the recently introduced Cantor-Kantorovich (or CK) distance.<n>In particular we show that the latter can be framed as a discounted sum of finite-horizon Total Variation distances.<n>More precisely, we show that the exact computation of the CK distance is #P-hard.
arXiv Detail & Related papers (2025-11-22T16:02:56Z) - Attention (as Discrete-Time Markov) Chains [70.46604474584181]
We introduce a new interpretation of the attention matrix as a discrete-time Markov chain.<n>Our main observation is that tokens corresponding to semantically similar regions form a set of metastable states.<n>Using these lightweight tools, we demonstrate state-of-the-art zero-shot segmentation.
arXiv Detail & Related papers (2025-07-23T16:20:47Z) - von Mises Quasi-Processes for Bayesian Circular Regression [57.88921637944379]
We explore a family of expressive and interpretable distributions over circle-valued random functions.
The resulting probability model has connections with continuous spin models in statistical physics.
For posterior inference, we introduce a new Stratonovich-like augmentation that lends itself to fast Markov Chain Monte Carlo sampling.
arXiv Detail & Related papers (2024-06-19T01:57:21Z) - Just Wing It: Near-Optimal Estimation of Missing Mass in a Markovian Sequence [13.552044856329648]
We develop a linear-runtime estimator called Windowed Good--Turing (WingIt)
We show that its risk decays as $widetildeO(mathsfT_mix/n)$, where $mathsfT_mix$ denotes the mixing time of the chain in total variation distance.
We extend our estimator to approximate the stationary mass placed on elements occurring with small frequency in the trajectory.
arXiv Detail & Related papers (2024-04-08T18:55:07Z) - Robust affine point matching via quadratic assignment on Grassmannians [50.366876079978056]
Robust Affine Matching with Grassmannians (RoAM) is a new algorithm to perform affine registration of point clouds.
The algorithm is based on minimizing the Frobenius distance between two elements of the Grassmannian.
arXiv Detail & Related papers (2023-03-05T15:27:24Z) - Stochastic Gradient Descent under Markovian Sampling Schemes [3.04585143845864]
We study a variation of vanilla gradient descent where the only has access to a Markovian sampling scheme.
We focus on obtaining rates of convergence under the least restrictive assumptions possible on the underlying Markov chain.
arXiv Detail & Related papers (2023-02-28T09:18:00Z) - Markovian Sliced Wasserstein Distances: Beyond Independent Projections [51.80527230603978]
We introduce a new family of SW distances, named Markovian sliced Wasserstein (MSW) distance, which imposes a first-order Markov structure on projecting directions.
We compare distances with previous SW variants in various applications such as flows, color transfer, and deep generative modeling to demonstrate the favorable performance of MSW.
arXiv Detail & Related papers (2023-01-10T01:58:15Z) - Shape And Structure Preserving Differential Privacy [70.08490462870144]
We show how the gradient of the squared distance function offers better control over sensitivity than the Laplace mechanism.
We also show how using the gradient of the squared distance function offers better control over sensitivity than the Laplace mechanism.
arXiv Detail & Related papers (2022-09-21T18:14:38Z) - cosFormer: Rethinking Softmax in Attention [60.557869510885205]
kernel methods are often adopted to reduce the complexity by approximating the softmax operator.
Due to the approximation errors, their performances vary in different tasks/corpus and suffer crucial performance drops.
We propose a linear transformer called cosFormer that can achieve comparable or better accuracy to the vanilla transformer.
arXiv Detail & Related papers (2022-02-17T17:53:48Z) - Optimal and instance-dependent guarantees for Markovian linear stochastic approximation [47.912511426974376]
We show a non-asymptotic bound of the order $t_mathrmmix tfracdn$ on the squared error of the last iterate of a standard scheme.
We derive corollaries of these results for policy evaluation with Markov noise.
arXiv Detail & Related papers (2021-12-23T18:47:50Z) - Kernel distance measures for time series, random fields and other
structured data [71.61147615789537]
kdiff is a novel kernel-based measure for estimating distances between instances of structured data.
It accounts for both self and cross similarities across the instances and is defined using a lower quantile of the distance distribution.
Some theoretical results are provided for separability conditions using kdiff as a distance measure for clustering and classification problems.
arXiv Detail & Related papers (2021-09-29T22:54:17Z) - On the Stability of Random Matrix Product with Markovian Noise:
Application to Linear Stochastic Approximation and TD Learning [33.24726879325671]
This paper studies the exponential stability of random matrix products driven by a general (possibly) state space Markov chain.
It is a cornerstone in the analysis of algorithms in machine learning.
arXiv Detail & Related papers (2021-01-30T08:39:38Z) - A Matrix Chernoff Bound for Markov Chains and Its Application to
Co-occurrence Matrices [30.49132891333353]
We prove a Chernoff-type bound for sums of matrix-valued random variables sampled via a regular (aperiodic and irreducible) finite Markov chain.
Our proof is based on the matrix expander (regular undirected graph) Chernoff bound [Garg et al. STOC '18] and scalar Chernoff-Hoeffding bounds for Markov chains.
Our matrix Chernoff bound for Markov chains can be applied to analyze the behavior of co-occurrence statistics for sequential data.
arXiv Detail & Related papers (2020-08-06T05:44:54Z) - Markov Random Geometric Graph (MRGG): A Growth Model for Temporal
Dynamic Networks [0.0]
We introduce Markov Random Geometric Graphs (MRGGs), a growth model for temporal dynamic networks.
We show how MRGGs can be used to detect dependence structure in growing graphs and to solve link prediction problems.
arXiv Detail & Related papers (2020-06-12T08:35:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.