Learning Low-Dimensional Nonlinear Structures from High-Dimensional
Noisy Data: An Integral Operator Approach
- URL: http://arxiv.org/abs/2203.00126v2
- Date: Thu, 6 Jul 2023 06:34:13 GMT
- Title: Learning Low-Dimensional Nonlinear Structures from High-Dimensional
Noisy Data: An Integral Operator Approach
- Authors: Xiucai Ding and Rong Ma
- Abstract summary: We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from high-dimensional and noisy observations.
The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold.
The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction.
- Score: 5.975670441166475
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a kernel-spectral embedding algorithm for learning low-dimensional
nonlinear structures from high-dimensional and noisy observations, where the
datasets are assumed to be sampled from an intrinsically low-dimensional
manifold and corrupted by high-dimensional noise. The algorithm employs an
adaptive bandwidth selection procedure which does not rely on prior knowledge
of the underlying manifold. The obtained low-dimensional embeddings can be
further utilized for downstream purposes such as data visualization, clustering
and prediction. Our method is theoretically justified and practically
interpretable. Specifically, we establish the convergence of the final
embeddings to their noiseless counterparts when the dimension and size of the
samples are comparably large, and characterize the effect of the
signal-to-noise ratio on the rate of convergence and phase transition. We also
prove convergence of the embeddings to the eigenfunctions of an integral
operator defined by the kernel map of some reproducing kernel Hilbert space
capturing the underlying nonlinear structures. Numerical simulations and
analysis of three real datasets show the superior empirical performance of the
proposed method, compared to many existing methods, on learning various
manifolds in diverse applications.
Related papers
- Diffusion-based Semi-supervised Spectral Algorithm for Regression on Manifolds [2.0649432688817444]
We introduce a novel diffusion-based spectral algorithm to tackle regression analysis on high-dimensional data.
Our method uses the local estimation property of heat kernel, offering an adaptive, data-driven approach to overcome this obstacle.
Our algorithm performs in an entirely data-driven manner, operating directly within the intrinsic manifold structure of the data.
arXiv Detail & Related papers (2024-10-18T15:29:04Z) - Kernel spectral joint embeddings for high-dimensional noisy datasets using duo-landmark integral operators [9.782959684053631]
We propose a novel kernel spectral method that achieves joint embeddings of two independently observed high-dimensional noisy datasets.
The obtained low-dimensional embeddings can be utilized for many downstream tasks such as simultaneous clustering, data visualization, and denoising.
arXiv Detail & Related papers (2024-05-20T18:29:36Z) - Hodge-Aware Contrastive Learning [101.56637264703058]
Simplicial complexes prove effective in modeling data with multiway dependencies.
We develop a contrastive self-supervised learning approach for processing simplicial data.
arXiv Detail & Related papers (2023-09-14T00:40:07Z) - Manifold Learning with Sparse Regularised Optimal Transport [0.17205106391379024]
Real-world datasets are subject to noisy observations and sampling, so that distilling information about the underlying manifold is a major challenge.
We propose a method for manifold learning that utilises a symmetric version of optimal transport with a quadratic regularisation.
We prove that the resulting kernel is consistent with a Laplace-type operator in the continuous limit, establish robustness to heteroskedastic noise and exhibit these results in simulations.
arXiv Detail & Related papers (2023-07-19T08:05:46Z) - Score-based Diffusion Models in Function Space [140.792362459734]
Diffusion models have recently emerged as a powerful framework for generative modeling.
We introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space.
We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z) - Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency [111.83670279016599]
We study reinforcement learning for partially observed decision processes (POMDPs) with infinite observation and state spaces.
We make the first attempt at partial observability and function approximation for a class of POMDPs with a linear structure.
arXiv Detail & Related papers (2022-04-20T21:15:38Z) - On the Benefits of Large Learning Rates for Kernel Methods [110.03020563291788]
We show that a phenomenon can be precisely characterized in the context of kernel methods.
We consider the minimization of a quadratic objective in a separable Hilbert space, and show that with early stopping, the choice of learning rate influences the spectral decomposition of the obtained solution.
arXiv Detail & Related papers (2022-02-28T13:01:04Z) - Graph Embedding via High Dimensional Model Representation for
Hyperspectral Images [9.228929858529678]
Learning the manifold structure of remote sensing images is of paramount relevance for modeling and understanding processes.
Manor learning methods have shown excellent performance to deal with hyperspectral image (HSI) analysis.
A common assumption to deal with the problem is that the transformation between the high-dimensional input space and the (typically low) latent space is linear.
The proposed method is compared to manifold learning methods along with its linear counterparts and achieves promising performance in terms of classification accuracy of a representative set of hyperspectral images.
arXiv Detail & Related papers (2021-11-29T16:42:15Z) - Tensor Laplacian Regularized Low-Rank Representation for Non-uniformly
Distributed Data Subspace Clustering [2.578242050187029]
Low-Rank Representation (LRR) suffers from discarding the locality information of data points in subspace clustering.
We propose a hypergraph model to facilitate having a variable number of adjacent nodes and incorporating the locality information of the data.
Experiments on artificial and real datasets demonstrate the higher accuracy and precision of the proposed method.
arXiv Detail & Related papers (2021-03-06T08:22:24Z) - Manifold Learning via Manifold Deflation [105.7418091051558]
dimensionality reduction methods provide a valuable means to visualize and interpret high-dimensional data.
Many popular methods can fail dramatically, even on simple two-dimensional Manifolds.
This paper presents an embedding method for a novel, incremental tangent space estimator that incorporates global structure as coordinates.
Empirically, we show our algorithm recovers novel and interesting embeddings on real-world and synthetic datasets.
arXiv Detail & Related papers (2020-07-07T10:04:28Z) - Deep Dimension Reduction for Supervised Representation Learning [51.10448064423656]
We propose a deep dimension reduction approach to learning representations with essential characteristics.
The proposed approach is a nonparametric generalization of the sufficient dimension reduction method.
We show that the estimated deep nonparametric representation is consistent in the sense that its excess risk converges to zero.
arXiv Detail & Related papers (2020-06-10T14:47:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.