Related papers: Partition of Unity Neural Networks for Interpretable Classification with Explicit Class Regions

Partition of Unity Neural Networks for Interpretable Classification with Explicit Class Regions

URL: http://arxiv.org/abs/2602.00511v1
Date: Sat, 31 Jan 2026 04:40:11 GMT
Title: Partition of Unity Neural Networks for Interpretable Classification with Explicit Class Regions
Authors: Akram Aldroubi,
Abstract summary: We introduce Partition of Unity Neural Networks (PUNN), an architecture in which class probabilities arise directly from a learned partition of unity.<n>Experiments show that PUNN with Gaussian-based gates achieves accuracy within 0.3-0.6% of standard multilayer perceptrons.
Score: 0.7252027234425333
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite their empirical success, neural network classifiers remain difficult to interpret. In softmax-based models, class regions are defined implicitly as solutions to systems of inequalities among logits, making them difficult to extract and visualize. We introduce Partition of Unity Neural Networks (PUNN), an architecture in which class probabilities arise directly from a learned partition of unity, without requiring a softmax layer. PUNN constructs $k$ nonnegative functions $h_1, \ldots, h_k$ satisfying $\sum_i h_i(x) = 1$, where each $h_i(x)$ directly represents $P(\text{class } i \mid x)$. Unlike softmax, where class regions are defined implicitly through coupled inequalities among logits, each PUNN partition function $h_i$ directly defines the probability of class $i$ as a standalone function of $x$. We prove that PUNN is dense in the space of continuous probability maps on compact domains. The gate functions $g_i$ that define the partition can use various activation functions (sigmoid, Gaussian, bump) and parameterizations ranging from flexible MLPs to parameter-efficient shape-informed designs (spherical shells, ellipsoids, spherical harmonics). Experiments on synthetic data, UCI benchmarks, and MNIST show that PUNN with MLP-based gates achieves accuracy within 0.3--0.6\% of standard multilayer perceptrons. When geometric priors match the data structure, shape-informed gates achieve comparable accuracy with up to 300$\times$ fewer parameters. These results demonstrate that interpretable-by-design architectures can be competitive with black-box models while providing transparent class probability assignments.

Related papers

Marginal Flow: a flexible and efficient framework for density estimation [6.94175385834858]
Current density modeling approaches suffer from at least one of the following shortcomings: expensive training, slow inference, approximate likelihood, mode collapse or architectural constraints.<n>We propose a simple yet powerful framework that overcomes these limitations altogether.<n>We define our model $q_theta(x)$ through a parametric distribution $q(x|w)$ with latent parameters $w$.<n>Instead of directly optimizing the latent variables $w$, our idea is to marginalize them out by sampling $w$ from a learnable distribution $q_theta(w)$, hence the name Marginal Flow
arXiv Detail & Related papers (2025-09-30T13:21:13Z)
Beyond Softmax: A Natural Parameterization for Categorical Random Variables [61.709831225296305]
We introduce the $textitcatnat$ function, a function composed of a sequence of hierarchical binary splits.<n>A rich set of experiments show that the proposed function improves the learning efficiency and yields models characterized by consistently higher test performance.
arXiv Detail & Related papers (2025-09-29T12:55:50Z)
v-PuNNs: van der Put Neural Networks for Transparent Ultrametric Representation Learning [0.0]
We introduce van der Put Neural Networks (v-PuNNs), the first architecture whose neurons are characteristic functions of p-adic balls in $mathbbZ_p$.<n>Under our Transparent Ultrametric Representation Learning (TURL) principle every weight is itself a p-adic number, giving exact subtree semantics.<n>V-PuNNs therefore bridge number theory and deep learning, offering exact, interpretable, and efficient models for hierarchical data.
arXiv Detail & Related papers (2025-08-01T18:23:38Z)
Adaptive $k$-nearest neighbor classifier based on the local estimation of the shape operator [49.87315310656657]
We introduce a new adaptive $k$-nearest neighbours ($kK$-NN) algorithm that explores the local curvature at a sample to adaptively defining the neighborhood size. Results on many real-world datasets indicate that the new $kK$-NN algorithm yields superior balanced accuracy compared to the established $k$-NN method.
arXiv Detail & Related papers (2024-09-08T13:08:45Z)
Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension [17.485243410774814]
In traditional models of supervised learning, the goal of a learner is to output a hypothesis that is competitive (to within $epsilon$) of the best fitting concept from some class.<n>We introduce a smoothed-analysis framework that requires a learner to compete only with the best agnostic.<n>We obtain the first algorithm forally learning intersections of $k$-halfspaces in time.
arXiv Detail & Related papers (2024-07-01T04:58:36Z)
Biology-inspired joint distribution neurons based on Hierarchical Correlation Reconstruction allowing for multidirectional neural networks [0.49728186750345144]
There are proposed novel artificial neurons based on HCR (Arnold Correlation Reconstruction) allowing to remove low level differences.<n>Such HCR network can also propagate probability distributions (also joint) like $rho(y,z|x)$.<n>It also allows for additional training approaches, like direct $(a_mathbfj)$ estimation, through tensor decomposition.
arXiv Detail & Related papers (2024-05-08T14:49:27Z)
Approximation Rates and VC-Dimension Bounds for (P)ReLU MLP Mixture of Experts [17.022107735675046]
Mixture-of-Experts (MoEs) can scale up beyond traditional deep learning models. We show that MoMLPs can generalize since the entire MoMLP model has a (finite) VC dimension of $tildeO(LmaxnL,JW)$.
arXiv Detail & Related papers (2024-02-05T19:11:57Z)
Agnostically Learning Multi-index Models with Queries [54.290489524576756]
We study the power of query access for the task of agnostic learning under the Gaussian distribution. We show that query access gives significant runtime improvements over random examples for agnostically learning MIMs.
arXiv Detail & Related papers (2023-12-27T15:50:47Z)
Distribution learning via neural differential equations: a nonparametric statistical perspective [1.4436965372953483]
This work establishes the first general statistical convergence analysis for distribution learning via ODE models trained through likelihood transformations. We show that the latter can be quantified via the $C1$-metric entropy of the class $mathcal F$. We then apply this general framework to the setting of $Ck$-smooth target densities, and establish nearly minimax-optimal convergence rates for two relevant velocity field classes $mathcal F$: $Ck$ functions and neural networks.
arXiv Detail & Related papers (2023-09-03T00:21:37Z)
On the Identifiability and Estimation of Causal Location-Scale Noise Models [122.65417012597754]
We study the class of location-scale or heteroscedastic noise models (LSNMs) We show the causal direction is identifiable up to some pathological cases. We propose two estimators for LSNMs: an estimator based on (non-linear) feature maps, and one based on neural networks.
arXiv Detail & Related papers (2022-10-13T17:18:59Z)
Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$. Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z)
Neural Bayes: A Generic Parameterization Method for Unsupervised Representation Learning [175.34232468746245]
We introduce a parameterization method called Neural Bayes. It allows computing statistical quantities that are in general difficult to compute. We show two independent use cases for this parameterization.
arXiv Detail & Related papers (2020-02-20T22:28:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.