Interpretable dimension reduction for compositional data
- URL: http://arxiv.org/abs/2509.05563v1
- Date: Sat, 06 Sep 2025 02:16:21 GMT
- Title: Interpretable dimension reduction for compositional data
- Authors: Junyoung Park, Cheolwoo Park, Jeongyoun Ahn,
- Abstract summary: High-dimensional compositional data pose unique statistical challenges due to the simplex constraint and excess zeros.<n>We introduce a novel framework for interpretable dimension reduction of compositional data that avoids extra transformations and zero imputations.<n>Our approach generalizes the concept of amalgamation by softening its operation, mapping high-dimensional directly to a lower-dimensional simplex, which can be visualized in ternary plots.
- Score: 14.60927508571862
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-dimensional compositional data, such as those from human microbiome studies, pose unique statistical challenges due to the simplex constraint and excess zeros. While dimension reduction is indispensable for analyzing such data, conventional approaches often rely on log-ratio transformations that compromise interpretability and distort the data through ad hoc zero replacements. We introduce a novel framework for interpretable dimension reduction of compositional data that avoids extra transformations and zero imputations. Our approach generalizes the concept of amalgamation by softening its operation, mapping high-dimensional compositions directly to a lower-dimensional simplex, which can be visualized in ternary plots. The framework further provides joint visualization of the reduction matrix, enabling intuitive, at-a-glance interpretation. To achieve optimal reduction within our framework, we incorporate sufficient dimension reduction, which defines a new identifiable objective: the central compositional subspace. For estimation, we propose a compositional kernel dimension reduction (CKDR) method. The estimator is provably consistent, exhibits sparsity that reveals underlying amalgamation structures, and comes with an intrinsic predictive model for downstream analyses. Applications to real microbiome datasets demonstrate that our approach provides a powerful graphical exploration tool for uncovering meaningful biological patterns, opening a new pathway for analyzing high-dimensional compositional data.
Related papers
- GeodesicNVS: Probability Density Geodesic Flow Matching for Novel View Synthesis [54.39598154430305]
We propose a Data-to-Data Flow Matching framework that learns deterministic transformations directly between paired views.<n>PDG-FM constrains flow trajectories using geodesic interpolants derived from probability density metrics of pretrained diffusion models.<n>These results highlight the advantages of incorporating data-dependent geometric regularization into deterministic flow matching for consistent novel view generation.
arXiv Detail & Related papers (2026-03-01T09:30:11Z) - SIGMA: Scalable Spectral Insights for LLM Collapse [51.863164847253366]
We introduce SIGMA (Spectral Inequalities for Gram Matrix Analysis), a unified framework for model collapse.<n>By utilizing benchmarks that deriving and deterministic bounds on the matrix's spectrum, SIGMA provides a mathematically grounded metric to track the contraction of the representation space.<n>We demonstrate that SIGMA effectively captures the transition towards states, offering both theoretical insights into the mechanics of collapse.
arXiv Detail & Related papers (2026-01-06T19:47:11Z) - Dimension reduction via score ratio matching [0.9012198585960441]
We propose a framework, derived from score-matching, to extend gradient-based dimension reduction to problems where gradients are unavailable.<n>We show that our approach outperforms standard score-matching for problems with low-dimensional structure.
arXiv Detail & Related papers (2024-10-25T22:21:03Z) - Enabling Tensor Decomposition for Time-Series Classification via A Simple Pseudo-Laplacian Contrast [26.28414569796961]
We propose a novel Pseudo Laplacian Contrast (PLC) tensor decomposition framework.
It integrates the data augmentation and cross-view Laplacian to enable the extraction of class-aware representations.
Experiments on various datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-23T16:48:13Z) - Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein [56.62376364594194]
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets.<n>In this work, we revisit these approaches under the lens of optimal transport and exhibit relationships with the Gromov-Wasserstein problem.<n>This unveils a new general framework, called distributional reduction, that recovers DR and clustering as special cases and allows addressing them jointly within a single optimization problem.
arXiv Detail & Related papers (2024-02-03T19:00:19Z) - CARE: Large Precision Matrix Estimation for Compositional Data [9.440956168571617]
We introduce a precise specification of the compositional precision matrix and relate it to its basis counterpart.
By exploiting this connection, we propose a composition regularized estimation (CARE) method for estimating the sparse basis precision matrix.
Our theory reveals an intriguing trade-off between identification and estimation, thereby highlighting the blessing of dimensionality in compositional data analysis.
arXiv Detail & Related papers (2023-09-13T14:20:22Z) - Laplacian-based Cluster-Contractive t-SNE for High Dimensional Data
Visualization [20.43471678277403]
We propose LaptSNE, a new graph-based dimensionality reduction method based on t-SNE.
Specifically, LaptSNE leverages the eigenvalue information of the graph Laplacian to shrink the potential clusters in the low-dimensional embedding.
We show how to calculate the gradient analytically, which may be of broad interest when considering optimization with Laplacian-composited objective.
arXiv Detail & Related papers (2022-07-25T14:10:24Z) - A Local Similarity-Preserving Framework for Nonlinear Dimensionality
Reduction with Neural Networks [56.068488417457935]
We propose a novel local nonlinear approach named Vec2vec for general purpose dimensionality reduction.
To train the neural network, we build the neighborhood similarity graph of a matrix and define the context of data points.
Experiments of data classification and clustering on eight real datasets show that Vec2vec is better than several classical dimensionality reduction methods in the statistical hypothesis test.
arXiv Detail & Related papers (2021-03-10T23:10:47Z) - Manifold Learning via Manifold Deflation [105.7418091051558]
dimensionality reduction methods provide a valuable means to visualize and interpret high-dimensional data.
Many popular methods can fail dramatically, even on simple two-dimensional Manifolds.
This paper presents an embedding method for a novel, incremental tangent space estimator that incorporates global structure as coordinates.
Empirically, we show our algorithm recovers novel and interesting embeddings on real-world and synthetic datasets.
arXiv Detail & Related papers (2020-07-07T10:04:28Z) - Deep Dimension Reduction for Supervised Representation Learning [51.10448064423656]
We propose a deep dimension reduction approach to learning representations with essential characteristics.
The proposed approach is a nonparametric generalization of the sufficient dimension reduction method.
We show that the estimated deep nonparametric representation is consistent in the sense that its excess risk converges to zero.
arXiv Detail & Related papers (2020-06-10T14:47:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.