Manifold Learning with Sparse Regularised Optimal Transport
- URL: http://arxiv.org/abs/2307.09816v1
- Date: Wed, 19 Jul 2023 08:05:46 GMT
- Title: Manifold Learning with Sparse Regularised Optimal Transport
- Authors: Stephen Zhang and Gilles Mordant and Tetsuya Matsumoto and Geoffrey
Schiebinger
- Abstract summary: Real-world datasets are subject to noisy observations and sampling, so that distilling information about the underlying manifold is a major challenge.
We propose a method for manifold learning that utilises a symmetric version of optimal transport with a quadratic regularisation.
We prove that the resulting kernel is consistent with a Laplace-type operator in the continuous limit, establish robustness to heteroskedastic noise and exhibit these results in simulations.
- Score: 0.17205106391379024
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Manifold learning is a central task in modern statistics and data science.
Many datasets (cells, documents, images, molecules) can be represented as point
clouds embedded in a high dimensional ambient space, however the degrees of
freedom intrinsic to the data are usually far fewer than the number of ambient
dimensions. The task of detecting a latent manifold along which the data are
embedded is a prerequisite for a wide family of downstream analyses. Real-world
datasets are subject to noisy observations and sampling, so that distilling
information about the underlying manifold is a major challenge. We propose a
method for manifold learning that utilises a symmetric version of optimal
transport with a quadratic regularisation that constructs a sparse and adaptive
affinity matrix, that can be interpreted as a generalisation of the
bistochastic kernel normalisation. We prove that the resulting kernel is
consistent with a Laplace-type operator in the continuous limit, establish
robustness to heteroskedastic noise and exhibit these results in simulations.
We identify a highly efficient computational scheme for computing this optimal
transport for discrete data and demonstrate that it outperforms competing
methods in a set of examples.
Related papers
- Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Improving embedding of graphs with missing data by soft manifolds [51.425411400683565]
The reliability of graph embeddings depends on how much the geometry of the continuous space matches the graph structure.
We introduce a new class of manifold, named soft manifold, that can solve this situation.
Using soft manifold for graph embedding, we can provide continuous spaces to pursue any task in data analysis over complex datasets.
arXiv Detail & Related papers (2023-11-29T12:48:33Z) - Score-based Diffusion Models in Function Space [140.792362459734]
Diffusion models have recently emerged as a powerful framework for generative modeling.
We introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space.
We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z) - Convolutional Filtering on Sampled Manifolds [122.06927400759021]
We show that convolutional filtering on a sampled manifold converges to continuous manifold filtering.
Our findings are further demonstrated empirically on a problem of navigation control.
arXiv Detail & Related papers (2022-11-20T19:09:50Z) - Nonlinear Isometric Manifold Learning for Injective Normalizing Flows [58.720142291102135]
We use isometries to separate manifold learning and density estimation.
We also employ autoencoders to design embeddings with explicit inverses that do not distort the probability distribution.
arXiv Detail & Related papers (2022-03-08T08:57:43Z) - Learning Low-Dimensional Nonlinear Structures from High-Dimensional
Noisy Data: An Integral Operator Approach [5.975670441166475]
We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from high-dimensional and noisy observations.
The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold.
The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction.
arXiv Detail & Related papers (2022-02-28T22:46:34Z) - Adaptive Cholesky Gaussian Processes [7.684183064816171]
We present a method to fit exact Gaussian process models to large datasets by considering only a subset of the data.
Our approach is novel in that the size of the subset is selected on the fly during exact inference with little computational overhead.
arXiv Detail & Related papers (2022-02-22T09:43:46Z) - Manifold embedding data-driven mechanics [0.0]
This article introduces a new data-driven approach that leverages a manifold embedding generated by the invertible neural network.
We achieve this by training a deep neural network to globally map data from the manifold onto a lower-dimensional Euclidean vector space.
arXiv Detail & Related papers (2021-12-18T04:38:32Z) - Inferring Manifolds From Noisy Data Using Gaussian Processes [17.166283428199634]
Most existing manifold learning algorithms replace the original data with lower dimensional coordinates.
This article proposes a new methodology for addressing these problems, allowing the estimated manifold between fitted data points.
arXiv Detail & Related papers (2021-10-14T15:50:38Z) - Efficient Multidimensional Functional Data Analysis Using Marginal
Product Basis Systems [2.4554686192257424]
We propose a framework for learning continuous representations from a sample of multidimensional functional data.
We show that the resulting estimation problem can be solved efficiently by the tensor decomposition.
We conclude with a real data application in neuroimaging.
arXiv Detail & Related papers (2021-07-30T16:02:15Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.