Reproducing Kernels and New Approaches in Compositional Data Analysis
- URL: http://arxiv.org/abs/2205.01158v1
- Date: Mon, 2 May 2022 18:46:23 GMT
- Title: Reproducing Kernels and New Approaches in Compositional Data Analysis
- Authors: Binglin Li and Jeongyoun Ahn
- Abstract summary: Analyzing compositional data such as human gut microbiomes needs a careful treatment of the geometry of the data.
In this work, based on the key observation that a compositional data are projective in nature, we re-interpret the compositional domain as the quotient topology of a sphere out by a group action.
This construction of RKHS for compositional data will widely open research avenues for future methodology developments.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Compositional data, such as human gut microbiomes, consist of non-negative
variables whose only the relative values to other variables are available.
Analyzing compositional data such as human gut microbiomes needs a careful
treatment of the geometry of the data. A common geometrical understanding of
compositional data is via a regular simplex. Majority of existing approaches
rely on a log-ratio or power transformations to overcome the innate simplicial
geometry. In this work, based on the key observation that a compositional data
are projective in nature, and on the intrinsic connection between projective
and spherical geometry, we re-interpret the compositional domain as the
quotient topology of a sphere modded out by a group action. This
re-interpretation allows us to understand the function space on compositional
domains in terms of that on spheres and to use spherical harmonics theory along
with reflection group actions for constructing a compositional Reproducing
Kernel Hilbert Space (RKHS). This construction of RKHS for compositional data
will widely open research avenues for future methodology developments. In
particular, well-developed kernel embedding methods can be now introduced to
compositional data analysis. The polynomial nature of compositional RKHS has
both theoretical and computational benefits. The wide applicability of the
proposed theoretical framework is exemplified with nonparametric density
estimation and kernel exponential family for compositional data.
Related papers
- (Deep) Generative Geodesics [57.635187092922976]
We introduce a newian metric to assess the similarity between any two data points.
Our metric leads to the conceptual definition of generative distances and generative geodesics.
Their approximations are proven to converge to their true values under mild conditions.
arXiv Detail & Related papers (2024-07-15T21:14:02Z) - Improving embedding of graphs with missing data by soft manifolds [51.425411400683565]
The reliability of graph embeddings depends on how much the geometry of the continuous space matches the graph structure.
We introduce a new class of manifold, named soft manifold, that can solve this situation.
Using soft manifold for graph embedding, we can provide continuous spaces to pursue any task in data analysis over complex datasets.
arXiv Detail & Related papers (2023-11-29T12:48:33Z) - CARE: Large Precision Matrix Estimation for Compositional Data [9.440956168571617]
We introduce a precise specification of the compositional precision matrix and relate it to its basis counterpart.
By exploiting this connection, we propose a composition regularized estimation (CARE) method for estimating the sparse basis precision matrix.
Our theory reveals an intriguing trade-off between identification and estimation, thereby highlighting the blessing of dimensionality in compositional data analysis.
arXiv Detail & Related papers (2023-09-13T14:20:22Z) - Topological Parallax: A Geometric Specification for Deep Perception
Models [0.778001492222129]
We introduce topological parallax as a theoretical and computational tool that compares a trained model to a reference dataset.
Our examples show that this geometric similarity between dataset and model is essential to trustworthy and perturbation.
This new concept will add value to the current debate regarding the unclear relationship between overfitting and generalization in applications of deep-learning.
arXiv Detail & Related papers (2023-06-20T18:45:24Z) - A Heat Diffusion Perspective on Geodesic Preserving Dimensionality
Reduction [66.21060114843202]
We propose a more general heat kernel based manifold embedding method that we call heat geodesic embeddings.
Results show that our method outperforms existing state of the art in preserving ground truth manifold distances.
We also showcase our method on single cell RNA-sequencing datasets with both continuum and cluster structure.
arXiv Detail & Related papers (2023-05-30T13:58:50Z) - Parametrizing Product Shape Manifolds by Composite Networks [5.772786223242281]
We show that it is possible to learn an efficient neural network approximation for shape spaces with a special product structure.
Our proposed architecture leverages this structure by separately learning approximations for the low-dimensional factors and a subsequent combination.
arXiv Detail & Related papers (2023-02-28T15:31:23Z) - Study of Manifold Geometry using Multiscale Non-Negative Kernel Graphs [32.40622753355266]
We propose a framework to study the geometric structure of the data.
We make use of our recently introduced non-negative kernel (NNK) regression graphs to estimate the point density, intrinsic dimension, and the linearity of the data manifold (curvature)
arXiv Detail & Related papers (2022-10-31T17:01:17Z) - Time-inhomogeneous diffusion geometry and topology [69.55228523791897]
Diffusion condensation is a time-inhomogeneous process where each step first computes and then applies a diffusion operator to the data.
We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives.
Our work gives theoretical insights into the convergence of diffusion condensation, and shows that it provides a link between topological and geometric data analysis.
arXiv Detail & Related papers (2022-03-28T16:06:17Z) - Clustered Federated Learning via Generalized Total Variation
Minimization [83.26141667853057]
We study optimization methods to train local (or personalized) models for local datasets with a decentralized network structure.
Our main conceptual contribution is to formulate federated learning as total variation minimization (GTV)
Our main algorithmic contribution is a fully decentralized federated learning algorithm.
arXiv Detail & Related papers (2021-05-26T18:07:19Z) - A Unifying and Canonical Description of Measure-Preserving Diffusions [60.59592461429012]
A complete recipe of measure-preserving diffusions in Euclidean space was recently derived unifying several MCMC algorithms into a single framework.
We develop a geometric theory that improves and generalises this construction to any manifold.
arXiv Detail & Related papers (2021-05-06T17:36:55Z) - A Sheaf and Topology Approach to Generating Local Branch Numbers in
Digital Images [9.645196221785694]
This paper concerns a theoretical approach that combines topological data analysis (TDA) and sheaf theory.
Sheaf theory provides a framework for describing the local consistency in geometric objects.
We show that the proposed theory can be applied to identify the branch numbers of local objects in digital images.
arXiv Detail & Related papers (2020-11-27T06:50:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.