Time-inhomogeneous diffusion geometry and topology
- URL: http://arxiv.org/abs/2203.14860v1
- Date: Mon, 28 Mar 2022 16:06:17 GMT
- Title: Time-inhomogeneous diffusion geometry and topology
- Authors: Guillaume Huguet, Alexander Tong, Bastian Rieck, Jessie Huang, Manik
Kuchroo, Matthew Hirn, Guy Wolf, Smita Krishnaswamy
- Abstract summary: Diffusion condensation is a time-inhomogeneous process where each step first computes and then applies a diffusion operator to the data.
We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives.
Our work gives theoretical insights into the convergence of diffusion condensation, and shows that it provides a link between topological and geometric data analysis.
- Score: 69.55228523791897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion condensation is a dynamic process that yields a sequence of
multiscale data representations that aim to encode meaningful abstractions. It
has proven effective for manifold learning, denoising, clustering, and
visualization of high-dimensional data. Diffusion condensation is constructed
as a time-inhomogeneous process where each step first computes and then applies
a diffusion operator to the data. We theoretically analyze the convergence and
evolution of this process from geometric, spectral, and topological
perspectives. From a geometric perspective, we obtain convergence bounds based
on the smallest transition probability and the radius of the data, whereas from
a spectral perspective, our bounds are based on the eigenspectrum of the
diffusion kernel. Our spectral results are of particular interest since most of
the literature on data diffusion is focused on homogeneous processes. From a
topological perspective, we show diffusion condensation generalizes
centroid-based hierarchical clustering. We use this perspective to obtain a
bound based on the number of data points, independent of their location. To
understand the evolution of the data geometry beyond convergence, we use
topological data analysis. We show that the condensation process itself defines
an intrinsic diffusion homology. We use this intrinsic topology as well as an
ambient topology to study how the data changes over diffusion time. We
demonstrate both homologies in well-understood toy examples. Our work gives
theoretical insights into the convergence of diffusion condensation, and shows
that it provides a link between topological and geometric data analysis.
Related papers
- Dynamical Regimes of Diffusion Models [14.797301819675454]
We study generative diffusion models in the regime where the dimension of space and the number of data are large.
Our analysis reveals three distinct dynamical regimes during the backward generative diffusion process.
The dependence of the collapse time on the dimension and number of data provides a thorough characterization of the curse of dimensionality for diffusion models.
arXiv Detail & Related papers (2024-02-28T17:19:26Z) - Time Series Diffusion in the Frequency Domain [54.60573052311487]
We analyze whether representing time series in the frequency domain is a useful inductive bias for score-based diffusion models.
We show that a dual diffusion process occurs in the frequency domain with an important nuance.
We show how to adapt the denoising score matching approach to implement diffusion models in the frequency domain.
arXiv Detail & Related papers (2024-02-08T18:59:05Z) - Datacube segmentation via Deep Spectral Clustering [76.48544221010424]
Extended Vision techniques often pose a challenge in their interpretation.
The huge dimensionality of data cube spectra poses a complex task in its statistical interpretation.
In this paper, we explore the possibility of applying unsupervised clustering methods in encoded space.
A statistical dimensional reduction is performed by an ad hoc trained (Variational) AutoEncoder, while the clustering process is performed by a (learnable) iterative K-Means clustering algorithm.
arXiv Detail & Related papers (2024-01-31T09:31:28Z) - Lecture Notes in Probabilistic Diffusion Models [0.5361320134021585]
Diffusion models are loosely modelled based on non-equilibrium thermodynamics.
The diffusion model learns the data manifold to which the original and thus the reconstructed data samples belong.
Diffusion models have -- unlike variational autoencoder and flow models -- latent variables with the same dimensionality as the original data.
arXiv Detail & Related papers (2023-12-16T09:36:54Z) - Persistent Homology for High-dimensional Data Based on Spectral Methods [16.58218530585593]
We show that persistent homology becomes very sensitive to noise and fails to detect the correct topology.
We find that spectral distances on the $k$-nearest-neighbor graph of the data, such as diffusion distance and effective resistance, allow to detect the correct topology even in the presence of high-dimensional noise.
arXiv Detail & Related papers (2023-11-06T13:18:08Z) - A Heat Diffusion Perspective on Geodesic Preserving Dimensionality
Reduction [66.21060114843202]
We propose a more general heat kernel based manifold embedding method that we call heat geodesic embeddings.
Results show that our method outperforms existing state of the art in preserving ground truth manifold distances.
We also showcase our method on single cell RNA-sequencing datasets with both continuum and cluster structure.
arXiv Detail & Related papers (2023-05-30T13:58:50Z) - Score Approximation, Estimation and Distribution Recovery of Diffusion
Models on Low-Dimensional Data [68.62134204367668]
This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace.
We show that with a properly chosen neural network architecture, the score function can be both accurately approximated and efficiently estimated.
The generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution.
arXiv Detail & Related papers (2023-02-14T17:02:35Z) - Convolutional Filtering on Sampled Manifolds [122.06927400759021]
We show that convolutional filtering on a sampled manifold converges to continuous manifold filtering.
Our findings are further demonstrated empirically on a problem of navigation control.
arXiv Detail & Related papers (2022-11-20T19:09:50Z) - Manifold Density Estimation via Generalized Dequantization [9.090451761951101]
Some kinds of data are not well-modeled by supposing that their underlying geometry is Euclidean.
For instance, some kinds of data may be known to lie on the surface of a sphere.
We propose a method, inspired by the literature on "dequantization," which we interpret through a coordinate transformation of an ambient Euclidean space.
arXiv Detail & Related papers (2021-02-14T12:40:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.