Large Data Limits of Laplace Learning for Gaussian Measure Data in Infinite Dimensions
- URL: http://arxiv.org/abs/2601.14515v1
- Date: Tue, 20 Jan 2026 22:14:05 GMT
- Title: Large Data Limits of Laplace Learning for Gaussian Measure Data in Infinite Dimensions
- Authors: Zhengang Zhong, Yury Korolev, Matthew Thorpe,
- Abstract summary: Laplace learning is a solution for finding missing labels from a partially labeled dataset.<n>The Lebesgue measure on infinite-dimensional spaces requires the analysis if the data aren't finite-dimensional.
- Score: 2.020917258669917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Laplace learning is a semi-supervised method, a solution for finding missing labels from a partially labeled dataset utilizing the geometry given by the unlabeled data points. The method minimizes a Dirichlet energy defined on a (discrete) graph constructed from the full dataset. In finite dimensions the asymptotics in the large (unlabeled) data limit are well understood with convergence from the graph setting to a continuum Sobolev semi-norm weighted by the Lebesgue density of the data-generating measure. The lack of the Lebesgue measure on infinite-dimensional spaces requires rethinking the analysis if the data aren't finite-dimensional. In this paper we make a first step in this direction by analyzing the setting when the data are generated by a Gaussian measure on a Hilbert space and proving pointwise convergence of the graph Dirichlet energy.
Related papers
- Laplace Learning in Wasserstein Space [20.33446919989862]
We assume manifold hypothesis to investigate graph-based semi-supervised learning methods.<n>In particular, we examine Laplace Learning in the Wasserstein space.<n>We prove variational convergence of a discrete graph p- Dirichlet energy to its continuum counterpart.
arXiv Detail & Related papers (2025-11-17T10:49:36Z) - Robust Tangent Space Estimation via Laplacian Eigenvector Gradient Orthogonalization [48.25304391127552]
Estimating the tangent spaces of a data manifold is a fundamental problem in data analysis.<n>We propose a method, Laplacian Eigenvector Gradient Orthogonalization (LEGO), that utilizes the global structure of the data to guide local tangent space estimation.
arXiv Detail & Related papers (2025-10-02T17:59:45Z) - Follow the Energy, Find the Path: Riemannian Metrics from Energy-Based Models [63.331590876872944]
We propose a method for deriving Riemannian metrics directly from pretrained Energy-Based Models.<n>These metrics define spatially varying distances, enabling the computation of geodesics.<n>We show that EBM-derived metrics consistently outperform established baselines.
arXiv Detail & Related papers (2025-05-23T12:18:08Z) - Stabilizing and Solving Unique Continuation Problems by Parameterizing Data and Learning Finite Element Solution Operators [0.0]
We consider an inverse problem involving the reconstruction of the solution to a nonlinear partial differential equation (PDE) with unknown boundary conditions.<n>To leverage this collective data, we first compress the boundary data using proper decomposition (POD) in a linear expansion.<n>We then identify a possible nonlinear low-dimensional structure in the expansion coefficients using an autoencoder, which provides a parametrization of the dataset in a lower-dimensional latent space.
arXiv Detail & Related papers (2024-12-05T18:31:14Z) - Noncommutative Model Selection for Data Clustering and Dimension Reduction Using Relative von Neumann Entropy [0.0]
We propose a pair of data-driven algorithms for unsupervised classification and dimension reduction.<n>In our experiments, our clustering algorithm outperforms $k$-means clustering on data sets with non-trivial geometry and topology.
arXiv Detail & Related papers (2024-11-29T18:04:11Z) - Learning Distances from Data with Normalizing Flows and Score Matching [9.605001452209867]
Density-based distances (DBDs) provide a principled approach to metric learning by defining distances in terms of the underlying data distribution.<n>We introduce a dimension-adapted Fermat distance that scales intuitively to high dimensions and improves numerical stability.
arXiv Detail & Related papers (2024-07-12T14:30:41Z) - Improving embedding of graphs with missing data by soft manifolds [51.425411400683565]
The reliability of graph embeddings depends on how much the geometry of the continuous space matches the graph structure.
We introduce a new class of manifold, named soft manifold, that can solve this situation.
Using soft manifold for graph embedding, we can provide continuous spaces to pursue any task in data analysis over complex datasets.
arXiv Detail & Related papers (2023-11-29T12:48:33Z) - Tight and fast generalization error bound of graph embedding in metric
space [54.279425319381374]
We show that graph embedding in non-Euclidean metric space can outperform that in Euclidean space with much smaller training data than the existing bound has suggested.
Our new upper bound is significantly tighter and faster than the existing one, which can be exponential to $R$ and $O(frac1S)$ at the fastest.
arXiv Detail & Related papers (2023-05-13T17:29:18Z) - Intrinsic dimension estimation for discrete metrics [65.5438227932088]
In this letter we introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces.
We demonstrate its accuracy on benchmark datasets, and we apply it to analyze a metagenomic dataset for species fingerprinting.
This suggests that evolutive pressure acts on a low-dimensional manifold despite the high-dimensionality of sequences' space.
arXiv Detail & Related papers (2022-07-20T06:38:36Z) - Spectral clustering under degree heterogeneity: a case for the random
walk Laplacian [83.79286663107845]
This paper shows that graph spectral embedding using the random walk Laplacian produces vector representations which are completely corrected for node degree.
In the special case of a degree-corrected block model, the embedding concentrates about K distinct points, representing communities.
arXiv Detail & Related papers (2021-05-03T16:36:27Z) - Manifold learning with arbitrary norms [8.433233101044197]
We show that manifold learning based on Earthmover's distances outperforms the standard Euclidean variant for learning molecular shape spaces.
We show in a numerical simulation that manifold learning based on Earthmover's distances outperforms the standard Euclidean variant for learning molecular shape spaces.
arXiv Detail & Related papers (2020-12-28T10:24:30Z) - Extending the average spectrum method: Grid points sampling and density
averaging [0.0]
We show that sampling the grid points, instead of keeping them fixed, also changes the functional integral limit.
The remaining bias depends mainly on the width of the grid density, so we go one step further and average also over densities of different widths.
arXiv Detail & Related papers (2020-04-02T17:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.