Time-inhomogeneous diffusion geometry and topology
- URL: http://arxiv.org/abs/2203.14860v1
- Date: Mon, 28 Mar 2022 16:06:17 GMT
- Title: Time-inhomogeneous diffusion geometry and topology
- Authors: Guillaume Huguet, Alexander Tong, Bastian Rieck, Jessie Huang, Manik
Kuchroo, Matthew Hirn, Guy Wolf, Smita Krishnaswamy
- Abstract summary: Diffusion condensation is a time-inhomogeneous process where each step first computes and then applies a diffusion operator to the data.
We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives.
Our work gives theoretical insights into the convergence of diffusion condensation, and shows that it provides a link between topological and geometric data analysis.
- Score: 69.55228523791897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion condensation is a dynamic process that yields a sequence of
multiscale data representations that aim to encode meaningful abstractions. It
has proven effective for manifold learning, denoising, clustering, and
visualization of high-dimensional data. Diffusion condensation is constructed
as a time-inhomogeneous process where each step first computes and then applies
a diffusion operator to the data. We theoretically analyze the convergence and
evolution of this process from geometric, spectral, and topological
perspectives. From a geometric perspective, we obtain convergence bounds based
on the smallest transition probability and the radius of the data, whereas from
a spectral perspective, our bounds are based on the eigenspectrum of the
diffusion kernel. Our spectral results are of particular interest since most of
the literature on data diffusion is focused on homogeneous processes. From a
topological perspective, we show diffusion condensation generalizes
centroid-based hierarchical clustering. We use this perspective to obtain a
bound based on the number of data points, independent of their location. To
understand the evolution of the data geometry beyond convergence, we use
topological data analysis. We show that the condensation process itself defines
an intrinsic diffusion homology. We use this intrinsic topology as well as an
ambient topology to study how the data changes over diffusion time. We
demonstrate both homologies in well-understood toy examples. Our work gives
theoretical insights into the convergence of diffusion condensation, and shows
that it provides a link between topological and geometric data analysis.
Related papers
- Manifolds, Random Matrices and Spectral Gaps: The geometric phases of generative diffusion [8.389423957434818]
We analyze the spectrum of eigenvalues of the Jacobian of the score function, whose discontinuities (gaps) reveal the presence and dimensionality of distinct sub-manifolds.
Our analysis reveals the existence of three distinct qualitative phases during the generative process.
arXiv Detail & Related papers (2024-10-08T10:55:40Z) - Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis [56.442307356162864]
We study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework.
We introduce a discrete-time sampling algorithm in the general state space $[S]d$ that utilizes score estimators at predefined time points.
Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function.
arXiv Detail & Related papers (2024-10-03T09:07:13Z) - Dynamical Regimes of Diffusion Models [14.797301819675454]
We study generative diffusion models in the regime where the dimension of space and the number of data are large.
Our analysis reveals three distinct dynamical regimes during the backward generative diffusion process.
The dependence of the collapse time on the dimension and number of data provides a thorough characterization of the curse of dimensionality for diffusion models.
arXiv Detail & Related papers (2024-02-28T17:19:26Z) - Datacube segmentation via Deep Spectral Clustering [76.48544221010424]
Extended Vision techniques often pose a challenge in their interpretation.
The huge dimensionality of data cube spectra poses a complex task in its statistical interpretation.
In this paper, we explore the possibility of applying unsupervised clustering methods in encoded space.
A statistical dimensional reduction is performed by an ad hoc trained (Variational) AutoEncoder, while the clustering process is performed by a (learnable) iterative K-Means clustering algorithm.
arXiv Detail & Related papers (2024-01-31T09:31:28Z) - Persistent Homology for High-dimensional Data Based on Spectral Methods [16.58218530585593]
We show that persistent homology becomes very sensitive to noise and fails to detect the correct topology.
We find that spectral distances on the k-nearest-neighbor graph of the data, such as diffusion distance and effective resistance, allow to detect the correct topology even in the presence of high-dimensional noise.
arXiv Detail & Related papers (2023-11-06T13:18:08Z) - A Heat Diffusion Perspective on Geodesic Preserving Dimensionality
Reduction [66.21060114843202]
We propose a more general heat kernel based manifold embedding method that we call heat geodesic embeddings.
Results show that our method outperforms existing state of the art in preserving ground truth manifold distances.
We also showcase our method on single cell RNA-sequencing datasets with both continuum and cluster structure.
arXiv Detail & Related papers (2023-05-30T13:58:50Z) - Score Approximation, Estimation and Distribution Recovery of Diffusion
Models on Low-Dimensional Data [68.62134204367668]
This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace.
We show that with a properly chosen neural network architecture, the score function can be both accurately approximated and efficiently estimated.
The generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution.
arXiv Detail & Related papers (2023-02-14T17:02:35Z) - Convolutional Filtering on Sampled Manifolds [122.06927400759021]
We show that convolutional filtering on a sampled manifold converges to continuous manifold filtering.
Our findings are further demonstrated empirically on a problem of navigation control.
arXiv Detail & Related papers (2022-11-20T19:09:50Z) - Manifold Density Estimation via Generalized Dequantization [9.090451761951101]
Some kinds of data are not well-modeled by supposing that their underlying geometry is Euclidean.
For instance, some kinds of data may be known to lie on the surface of a sphere.
We propose a method, inspired by the literature on "dequantization," which we interpret through a coordinate transformation of an ambient Euclidean space.
arXiv Detail & Related papers (2021-02-14T12:40:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.