Probabilistic Foundations of Fuzzy Simplicial Sets for Nonlinear Dimensionality Reduction
- URL: http://arxiv.org/abs/2512.03899v1
- Date: Wed, 03 Dec 2025 15:49:38 GMT
- Title: Probabilistic Foundations of Fuzzy Simplicial Sets for Nonlinear Dimensionality Reduction
- Authors: Janis Keck, Lukas Silvester Barth, Fatemeh, Fahimi, Parvaneh Joharinad, Jürgen Jost,
- Abstract summary: Fuzzy simplicial sets have become an object of interest in dimensionality reduction and manifold learning.<n>We introduce a framework that explains fuzzy simplicial sets as marginals of probability measures on simplicial sets.
- Score: 2.2536021123168055
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fuzzy simplicial sets have become an object of interest in dimensionality reduction and manifold learning, most prominently through their role in UMAP. However, their definition through tools from algebraic topology without a clear probabilistic interpretation detaches them from commonly used theoretical frameworks in those areas. In this work we introduce a framework that explains fuzzy simplicial sets as marginals of probability measures on simplicial sets. In particular, this perspective shows that the fuzzy weights of UMAP arise from a generative model that samples Vietoris-Rips filtrations at random scales, yielding cumulative distribution functions of pairwise distances. More generally, the framework connects fuzzy simplicial sets to probabilistic models on the face poset, clarifies the relation between Kullback-Leibler divergence and fuzzy cross-entropy in this setting, and recovers standard t-norms and t-conorms via Boolean operations on the underlying simplicial sets. We then show how new embedding methods may be derived from this framework and illustrate this on an example where we generalize UMAP using Čech filtrations with triplet sampling. In summary, this probabilistic viewpoint provides a unified probabilistic theoretical foundation for fuzzy simplicial sets, clarifies the role of UMAP within this framework, and enables the systematic derivation of new dimensionality reduction methods.
Related papers
- Support Tokens, Stability Margins, and a New Foundation for Robust LLMs [1.429795922604976]
We re-interpret causal self-attention transformers, the backbone of modern foundation models.<n>A barrier constraint emerges on the self-attention parameters.<n>This reveals a boundary where attention becomes ill-conditioned.
arXiv Detail & Related papers (2026-02-25T08:44:44Z) - SIGMA: Scalable Spectral Insights for LLM Collapse [51.863164847253366]
We introduce SIGMA (Spectral Inequalities for Gram Matrix Analysis), a unified framework for model collapse.<n>By utilizing benchmarks that deriving and deterministic bounds on the matrix's spectrum, SIGMA provides a mathematically grounded metric to track the contraction of the representation space.<n>We demonstrate that SIGMA effectively captures the transition towards states, offering both theoretical insights into the mechanics of collapse.
arXiv Detail & Related papers (2026-01-06T19:47:11Z) - Adaptive Symmetrization of the KL Divergence [10.632997610787207]
Many tasks in machine learning can be described as or reduced to learning a probability distribution given a finite set of samples.<n>A common approach is to minimize a statistical divergence between the (empirical) data distribution and a parameterized distribution, e.g., a normalizing flow (NF) or an energy-based model (EBM)
arXiv Detail & Related papers (2025-11-14T10:41:59Z) - Generative Flexible Latent Structure Regression (GFLSR) model [0.5586073503694489]
This paper proposes a Generative Flexible Latent Structure Regression (GFLSR) model structure to address this problem.<n>We show that most linear continuous latent variable methods can be represented under the proposed framework.<n>With a model structure, we analyse the convergence of the parameters and the latent variables.
arXiv Detail & Related papers (2025-08-06T12:37:45Z) - Overcoming Dimensional Factorization Limits in Discrete Diffusion Models through Quantum Joint Distribution Learning [79.65014491424151]
We propose a quantum Discrete Denoising Diffusion Probabilistic Model (QD3PM)<n>It enables joint probability learning through diffusion and denoising in exponentially large Hilbert spaces.<n>This paper establishes a new theoretical paradigm in generative models by leveraging the quantum advantage in joint distribution learning.
arXiv Detail & Related papers (2025-05-08T11:48:21Z) - Towards One Model for Classical Dimensionality Reduction: A Probabilistic Perspective on UMAP and t-SNE [8.121681696358717]
We recast dimensionality reduction methods as MAP inference methods corresponding to a model introduced in Ravuri et al.<n>We show that well-known kernels can be used to describe covariances implied by graph Laplacians.<n>We introduce tools with which similar dimensionality reduction methods can be studied.
arXiv Detail & Related papers (2024-05-27T17:57:12Z) - Constrained Synthesis with Projected Diffusion Models [47.56192362295252]
This paper introduces an approach to generative diffusion processes the ability to satisfy and certify compliance with constraints and physical principles.
The proposed method recast the traditional process of generative diffusion as a constrained distribution problem to ensure adherence to constraints.
arXiv Detail & Related papers (2024-02-05T22:18:16Z) - On the Granular Representation of Fuzzy Quantifier-Based Fuzzy Rough
Sets [0.7614628596146602]
This paper focuses on fuzzy quantifier-based fuzzy rough sets (FQFRS)
It shows that Choquet-based fuzzy rough sets can be represented granularly under the same conditions as OWA-based fuzzy rough sets.
This observation highlights the potential of these models for resolving data inconsistencies and managing noise.
arXiv Detail & Related papers (2023-12-27T20:02:40Z) - Joint Bayesian Inference of Graphical Structure and Parameters with a
Single Generative Flow Network [59.79008107609297]
We propose in this paper to approximate the joint posterior over the structure of a Bayesian Network.
We use a single GFlowNet whose sampling policy follows a two-phase process.
Since the parameters are included in the posterior distribution, this leaves more flexibility for the local probability models.
arXiv Detail & Related papers (2023-05-30T19:16:44Z) - A Robustness Analysis of Blind Source Separation [91.3755431537592]
Blind source separation (BSS) aims to recover an unobserved signal from its mixture $X=f(S)$ under the condition that the transformation $f$ is invertible but unknown.
We present a general framework for analysing such violations and quantifying their impact on the blind recovery of $S$ from $X$.
We show that a generic BSS-solution in response to general deviations from its defining structural assumptions can be profitably analysed in the form of explicit continuity guarantees.
arXiv Detail & Related papers (2023-03-17T16:30:51Z) - Wrapped Distributions on homogeneous Riemannian manifolds [58.720142291102135]
Control over distributions' properties, such as parameters, symmetry and modality yield a family of flexible distributions.
We empirically validate our approach by utilizing our proposed distributions within a variational autoencoder and a latent space network model.
arXiv Detail & Related papers (2022-04-20T21:25:21Z) - Categorical Distributions of Maximum Entropy under Marginal Constraints [0.0]
estimation of categorical distributions under marginal constraints is key for many machine-learning and data-driven approaches.
We provide a parameter-agnostic theoretical framework that ensures that a categorical distribution of Maximum Entropy under marginal constraints always exists.
arXiv Detail & Related papers (2022-04-07T12:42:58Z) - Random Forest Weighted Local Fréchet Regression with Random Objects [18.128663071848923]
We propose a novel random forest weighted local Fr'echet regression paradigm.<n>Our first method uses these weights as the local average to solve the conditional Fr'echet mean.<n>Second method performs local linear Fr'echet regression, both significantly improving existing Fr'echet regression methods.
arXiv Detail & Related papers (2022-02-10T09:10:59Z) - Closed-Form Factorization of Latent Semantics in GANs [65.42778970898534]
A rich set of interpretable dimensions has been shown to emerge in the latent space of the Generative Adversarial Networks (GANs) trained for synthesizing images.
In this work, we examine the internal representation learned by GANs to reveal the underlying variation factors in an unsupervised manner.
We propose a closed-form factorization algorithm for latent semantic discovery by directly decomposing the pre-trained weights.
arXiv Detail & Related papers (2020-07-13T18:05:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.