Reproducing Kernels and New Approaches in Compositional Data Analysis
- URL: http://arxiv.org/abs/2205.01158v1
- Date: Mon, 2 May 2022 18:46:23 GMT
- Title: Reproducing Kernels and New Approaches in Compositional Data Analysis
- Authors: Binglin Li and Jeongyoun Ahn
- Abstract summary: Analyzing compositional data such as human gut microbiomes needs a careful treatment of the geometry of the data.
In this work, based on the key observation that a compositional data are projective in nature, we re-interpret the compositional domain as the quotient topology of a sphere out by a group action.
This construction of RKHS for compositional data will widely open research avenues for future methodology developments.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Compositional data, such as human gut microbiomes, consist of non-negative
variables whose only the relative values to other variables are available.
Analyzing compositional data such as human gut microbiomes needs a careful
treatment of the geometry of the data. A common geometrical understanding of
compositional data is via a regular simplex. Majority of existing approaches
rely on a log-ratio or power transformations to overcome the innate simplicial
geometry. In this work, based on the key observation that a compositional data
are projective in nature, and on the intrinsic connection between projective
and spherical geometry, we re-interpret the compositional domain as the
quotient topology of a sphere modded out by a group action. This
re-interpretation allows us to understand the function space on compositional
domains in terms of that on spheres and to use spherical harmonics theory along
with reflection group actions for constructing a compositional Reproducing
Kernel Hilbert Space (RKHS). This construction of RKHS for compositional data
will widely open research avenues for future methodology developments. In
particular, well-developed kernel embedding methods can be now introduced to
compositional data analysis. The polynomial nature of compositional RKHS has
both theoretical and computational benefits. The wide applicability of the
proposed theoretical framework is exemplified with nonparametric density
estimation and kernel exponential family for compositional data.
Related papers
- CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization [51.716834831684004]
We study the problem of conformal prediction (CP) under geometric data shifts.<n>We propose integrating geometric information--such as geometric pose--into the conformal procedure to reinstate its guarantees.
arXiv Detail & Related papers (2025-06-19T10:12:02Z) - Riemannian Principal Component Analysis [0.0]
This paper proposes an innovative extension of Principal Component Analysis (PCA) that transcends the traditional assumption of data lying in Euclidean space.<n>We adapt PCA to include local metrics, enabling the incorporation of manifold geometry.
arXiv Detail & Related papers (2025-05-30T21:04:01Z) - Symplectic Generative Networks (SGNs): A Hamiltonian Framework for Invertible Deep Generative Modeling [0.0]
We introduce the Symplectic Generative Network (SGN), a deep generative model that leverages Hamiltonian mechanics to construct an invertible, volume-preserving mapping between a latent space and the data space.<n>By endowing the latent space with a symplectic structure and modeling data generation as the time evolution of a Hamiltonian system, SGN achieves exact likelihood evaluation without incurring the computational overhead of Jacobian calculations.
arXiv Detail & Related papers (2025-05-28T16:13:36Z) - Cryo-em images are intrinsically low dimensional [3.216132991084434]
We study the underlying geometric structure of Cryo SBI representations of hemagglutinin (simulated and experimental)
We establish a direct link between the latent structure and key physical parameters.
arXiv Detail & Related papers (2025-04-15T14:46:25Z) - Causal Discovery from Data Assisted by Large Language Models [50.193740129296245]
It is essential to integrate experimental data with prior domain knowledge for knowledge driven discovery.
Here we demonstrate this approach by combining high-resolution scanning transmission electron microscopy (STEM) data with insights derived from large language models (LLMs)
By fine-tuning ChatGPT on domain-specific literature, we construct adjacency matrices for Directed Acyclic Graphs (DAGs) that map the causal relationships between structural, chemical, and polarization degrees of freedom in Sm-doped BiFeO3 (SmBFO)
arXiv Detail & Related papers (2025-03-18T02:14:49Z) - Wrapped Gaussian on the manifold of Symmetric Positive Definite Matrices [6.7523635840772505]
Circular and non-flat data distributions are prevalent across diverse domains of data science.
A principled approach to accounting for the underlying geometry of such data is pivotal.
This work lays the groundwork for extending classical machine learning and statistical methods to more complex and structured data.
arXiv Detail & Related papers (2025-02-03T16:46:46Z) - Score-based pullback Riemannian geometry [10.649159213723106]
We propose a framework for data-driven Riemannian geometry that is scalable in both geometry and learning.
We produce high-quality geodesics through the data support and reliably estimates the intrinsic dimension of the data manifold.
Our framework can naturally be used with anisotropic normalizing flows by adopting isometry regularization during training.
arXiv Detail & Related papers (2024-10-02T18:52:12Z) - (Deep) Generative Geodesics [57.635187092922976]
We introduce a newian metric to assess the similarity between any two data points.
Our metric leads to the conceptual definition of generative distances and generative geodesics.
Their approximations are proven to converge to their true values under mild conditions.
arXiv Detail & Related papers (2024-07-15T21:14:02Z) - Improving embedding of graphs with missing data by soft manifolds [51.425411400683565]
The reliability of graph embeddings depends on how much the geometry of the continuous space matches the graph structure.
We introduce a new class of manifold, named soft manifold, that can solve this situation.
Using soft manifold for graph embedding, we can provide continuous spaces to pursue any task in data analysis over complex datasets.
arXiv Detail & Related papers (2023-11-29T12:48:33Z) - CARE: Large Precision Matrix Estimation for Compositional Data [9.440956168571617]
We introduce a precise specification of the compositional precision matrix and relate it to its basis counterpart.
By exploiting this connection, we propose a composition regularized estimation (CARE) method for estimating the sparse basis precision matrix.
Our theory reveals an intriguing trade-off between identification and estimation, thereby highlighting the blessing of dimensionality in compositional data analysis.
arXiv Detail & Related papers (2023-09-13T14:20:22Z) - A Heat Diffusion Perspective on Geodesic Preserving Dimensionality
Reduction [66.21060114843202]
We propose a more general heat kernel based manifold embedding method that we call heat geodesic embeddings.
Results show that our method outperforms existing state of the art in preserving ground truth manifold distances.
We also showcase our method on single cell RNA-sequencing datasets with both continuum and cluster structure.
arXiv Detail & Related papers (2023-05-30T13:58:50Z) - Parametrizing Product Shape Manifolds by Composite Networks [5.772786223242281]
We show that it is possible to learn an efficient neural network approximation for shape spaces with a special product structure.
Our proposed architecture leverages this structure by separately learning approximations for the low-dimensional factors and a subsequent combination.
arXiv Detail & Related papers (2023-02-28T15:31:23Z) - Study of Manifold Geometry using Multiscale Non-Negative Kernel Graphs [32.40622753355266]
We propose a framework to study the geometric structure of the data.
We make use of our recently introduced non-negative kernel (NNK) regression graphs to estimate the point density, intrinsic dimension, and the linearity of the data manifold (curvature)
arXiv Detail & Related papers (2022-10-31T17:01:17Z) - Time-inhomogeneous diffusion geometry and topology [69.55228523791897]
Diffusion condensation is a time-inhomogeneous process where each step first computes and then applies a diffusion operator to the data.
We theoretically analyze the convergence and evolution of this process from geometric, spectral, and topological perspectives.
Our work gives theoretical insights into the convergence of diffusion condensation, and shows that it provides a link between topological and geometric data analysis.
arXiv Detail & Related papers (2022-03-28T16:06:17Z) - Clustered Federated Learning via Generalized Total Variation
Minimization [83.26141667853057]
We study optimization methods to train local (or personalized) models for local datasets with a decentralized network structure.
Our main conceptual contribution is to formulate federated learning as total variation minimization (GTV)
Our main algorithmic contribution is a fully decentralized federated learning algorithm.
arXiv Detail & Related papers (2021-05-26T18:07:19Z) - A Unifying and Canonical Description of Measure-Preserving Diffusions [60.59592461429012]
A complete recipe of measure-preserving diffusions in Euclidean space was recently derived unifying several MCMC algorithms into a single framework.
We develop a geometric theory that improves and generalises this construction to any manifold.
arXiv Detail & Related papers (2021-05-06T17:36:55Z) - A Sheaf and Topology Approach to Generating Local Branch Numbers in
Digital Images [9.645196221785694]
This paper concerns a theoretical approach that combines topological data analysis (TDA) and sheaf theory.
Sheaf theory provides a framework for describing the local consistency in geometric objects.
We show that the proposed theory can be applied to identify the branch numbers of local objects in digital images.
arXiv Detail & Related papers (2020-11-27T06:50:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.