Related papers: Manifold Learning with Sparse Regularised Optimal Transport

Manifold Learning with Sparse Regularised Optimal Transport

URL: http://arxiv.org/abs/2307.09816v2
Date: Mon, 17 Feb 2025 16:24:09 GMT
Title: Manifold Learning with Sparse Regularised Optimal Transport
Authors: Stephen Zhang, Gilles Mordant, Tetsuya Matsumoto, Geoffrey Schiebinger,
Abstract summary: Real-world datasets are subject to noisy observations and sampling, so that distilling information about the underlying manifold is a major challenge.<n>We propose a method for manifold learning that utilises a symmetric version of optimal transport with a quadratic regularisation.<n>We prove that the resulting kernel is consistent with a Laplace-type operator in the continuous limit, establish robustness to heteroskedastic noise and exhibit these results in numerical experiments.
Score: 1.949927790632678
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Manifold learning is a central task in modern statistics and data science. Many datasets (cells, documents, images, molecules) can be represented as point clouds embedded in a high dimensional ambient space, however the degrees of freedom intrinsic to the data are usually far fewer than the number of ambient dimensions. The task of detecting a latent manifold along which the data are embedded is a prerequisite for a wide family of downstream analyses. Real-world datasets are subject to noisy observations and sampling, so that distilling information about the underlying manifold is a major challenge. We propose a method for manifold learning that utilises a symmetric version of optimal transport with a quadratic regularisation that constructs a sparse and adaptive affinity matrix, that can be interpreted as a generalisation of the bistochastic kernel normalisation. We prove that the resulting kernel is consistent with a Laplace-type operator in the continuous limit, establish robustness to heteroskedastic noise and exhibit these results in numerical experiments. We identify a highly efficient computational scheme for computing this optimal transport for discrete data and demonstrate that it outperforms competing methods in a set of examples.

Related papers

Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation. In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model. We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z)
Improving embedding of graphs with missing data by soft manifolds [51.425411400683565]
The reliability of graph embeddings depends on how much the geometry of the continuous space matches the graph structure. We introduce a new class of manifold, named soft manifold, that can solve this situation. Using soft manifold for graph embedding, we can provide continuous spaces to pursue any task in data analysis over complex datasets.
arXiv Detail & Related papers (2023-11-29T12:48:33Z)
Canonical normalizing flows for manifold learning [12.169916344037585]
We propose a canonical manifold learning flow method, where a novel objective enforces the transformation matrix to have few prominent and non-degenerate basis functions. Canonical manifold flow yields a more efficient use of the latent space, automatically generating fewer prominent and distinct dimensions to represent data.
arXiv Detail & Related papers (2023-10-19T13:48:05Z)
A Heat Diffusion Perspective on Geodesic Preserving Dimensionality Reduction [66.21060114843202]
We propose a more general heat kernel based manifold embedding method that we call heat geodesic embeddings. Results show that our method outperforms existing state of the art in preserving ground truth manifold distances. We also showcase our method on single cell RNA-sequencing datasets with both continuum and cluster structure.
arXiv Detail & Related papers (2023-05-30T13:58:50Z)
Score-based Diffusion Models in Function Space [140.792362459734]
Diffusion models have recently emerged as a powerful framework for generative modeling. We introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space. We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z)
Convolutional Filtering on Sampled Manifolds [122.06927400759021]
We show that convolutional filtering on a sampled manifold converges to continuous manifold filtering. Our findings are further demonstrated empirically on a problem of navigation control.
arXiv Detail & Related papers (2022-11-20T19:09:50Z)
Nonlinear Isometric Manifold Learning for Injective Normalizing Flows [58.720142291102135]
We use isometries to separate manifold learning and density estimation. We also employ autoencoders to design embeddings with explicit inverses that do not distort the probability distribution.
arXiv Detail & Related papers (2022-03-08T08:57:43Z)
Learning Low-Dimensional Nonlinear Structures from High-Dimensional Noisy Data: An Integral Operator Approach [5.975670441166475]
We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from high-dimensional and noisy observations. The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold. The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction.
arXiv Detail & Related papers (2022-02-28T22:46:34Z)
Adaptive Cholesky Gaussian Processes [7.684183064816171]
We present a method to fit exact Gaussian process models to large datasets by considering only a subset of the data. Our approach is novel in that the size of the subset is selected on the fly during exact inference with little computational overhead.
arXiv Detail & Related papers (2022-02-22T09:43:46Z)
Manifold embedding data-driven mechanics [0.0]
This article introduces a new data-driven approach that leverages a manifold embedding generated by the invertible neural network. We achieve this by training a deep neural network to globally map data from the manifold onto a lower-dimensional Euclidean vector space.
arXiv Detail & Related papers (2021-12-18T04:38:32Z)
Inferring Manifolds From Noisy Data Using Gaussian Processes [17.166283428199634]
Most existing manifold learning algorithms replace the original data with lower dimensional coordinates. This article proposes a new methodology for addressing these problems, allowing the estimated manifold between fitted data points.
arXiv Detail & Related papers (2021-10-14T15:50:38Z)
Efficient Multidimensional Functional Data Analysis Using Marginal Product Basis Systems [2.4554686192257424]
We propose a framework for learning continuous representations from a sample of multidimensional functional data. We show that the resulting estimation problem can be solved efficiently by the tensor decomposition. We conclude with a real data application in neuroimaging.
arXiv Detail & Related papers (2021-07-30T16:02:15Z)
Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization. We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.