Nonparametric Estimation of Joint Entropy through Partitioned Sample-Spacing Method
- URL: http://arxiv.org/abs/2511.13602v1
- Date: Mon, 17 Nov 2025 17:05:34 GMT
- Title: Nonparametric Estimation of Joint Entropy through Partitioned Sample-Spacing Method
- Authors: Jungwoo Ho, Sangun Park, Soyeong Oh,
- Abstract summary: We propose a nonparametric estimator of multivariate joint entropy based on partitioned sample spacings (PSS)<n>PSS consistently outperforms k-nearest neighbor estimators and achieves accuracy competitive with recent normalizing flow-based methods.<n>These properties position PSS as a practical alternative to normalizing flow-based approaches, with broad potential in information-theoretic machine learning applications.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a nonparametric estimator of multivariate joint entropy based on partitioned sample spacings (PSS). The method extends univariate spacing ideas to multivariate settings by partitioning the sample space into localized cells and aggregating within-cell statistics, with strong consistency guarantees under mild conditions. In benchmarks across diverse distributions, PSS consistently outperforms k-nearest neighbor estimators and achieves accuracy competitive with recent normalizing flow-based methods, while requiring no training or auxiliary density modeling. The estimator scales favorably in moderately high dimensions (d = 10 to 40) and shows particular robustness to correlated or skewed distributions. These properties position PSS as a practical alternative to normalizing flow-based approaches, with broad potential in information-theoretic machine learning applications.
Related papers
- Flow-Based Density Ratio Estimation for Intractable Distributions with Applications in Genomics [80.05951561886123]
We leverage condition-aware flow matching to derive a single dynamical formulation for tracking density ratios along generative trajectories.<n>We demonstrate competitive performance on simulated benchmarks for closed-form ratio estimation, and show that our method supports versatile tasks in single-cell genomics data analysis.
arXiv Detail & Related papers (2026-02-27T17:27:55Z) - Efficient Covariance Estimation for Sparsified Functional Data [51.69796254617083]
proposed Random-knots (Random-knots-Spatial) and B-spline (Bspline-Spatial) estimators of the covariance function are computationally efficient.<n>Asymptotic pointwise of the covariance are obtained for sparsified individual trajectories under some regularity conditions.
arXiv Detail & Related papers (2025-11-23T00:50:33Z) - A Unified MDL-based Binning and Tensor Factorization Framework for PDF Estimation [19.747102062281545]
We present a novel non-parametric approach for multivariate probability density function estimation (PDF)<n>Our approach builds upon tensor factorization techniques, leveraging the canonical polyadic decomposition (CPD) of a joint probability tensor.<n>We demonstrate the effectiveness of our method on synthetic data and a challenging real dry bean classification dataset.
arXiv Detail & Related papers (2025-04-25T20:27:04Z) - Stochastic Optimization with Optimal Importance Sampling [49.484190237840714]
We propose an iterative-based algorithm that jointly updates the decision and the IS distribution without requiring time-scale separation between the two.<n>Our method achieves the lowest possible variable variance and guarantees global convergence under convexity of the objective and mild assumptions on the IS distribution family.
arXiv Detail & Related papers (2025-04-04T16:10:18Z) - Bounds in Wasserstein Distance for Locally Stationary Processes [0.29771206318712146]
We introduce a novel conditional probability distribution estimator specifically tailored for locally stationary (LSP) data.<n>We rigorously establish convergence rates for the NW-based conditional probability estimator under the Wasserstein metric.<n>We conduct extensive numerical simulations on synthetic datasets and provide empirical validations using real-world data.
arXiv Detail & Related papers (2024-12-04T15:51:22Z) - Bayesian Frequency Estimation Under Local Differential Privacy With an Adaptive Randomized Response Mechanism [0.4604003661048266]
We propose AdOBEst-LDP, a new algorithm for adaptive, online Bayesian estimation of categorical distributions under local differential privacy.<n>By adapting the subset selection process to the past privatized data via Bayesian estimation, the algorithm improves the utility of future privatized data.
arXiv Detail & Related papers (2024-05-11T13:59:52Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Ensemble Kalman Filtering Meets Gaussian Process SSM for Non-Mean-Field and Online Inference [47.460898983429374]
We introduce an ensemble Kalman filter (EnKF) into the non-mean-field (NMF) variational inference framework to approximate the posterior distribution of the latent states.
This novel marriage between EnKF and GPSSM not only eliminates the need for extensive parameterization in learning variational distributions, but also enables an interpretable, closed-form approximation of the evidence lower bound (ELBO)
We demonstrate that the resulting EnKF-aided online algorithm embodies a principled objective function by ensuring data-fitting accuracy while incorporating model regularizations to mitigate overfitting.
arXiv Detail & Related papers (2023-12-10T15:22:30Z) - Multi-Fidelity Covariance Estimation in the Log-Euclidean Geometry [0.0]
We introduce a multi-fidelity estimator of covariance matrices that employs the log-Euclidean geometry of the symmetric positive-definite manifold.
We develop an optimal sample allocation scheme that minimizes the mean-squared error of the estimator given a fixed budget.
Evaluations of our approach using data from physical applications demonstrate more accurate metric learning and speedups of more than one order of magnitude compared to benchmarks.
arXiv Detail & Related papers (2023-01-31T16:33:46Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - A likelihood approach to nonparametric estimation of a singular
distribution using deep generative models [4.329951775163721]
We investigate a likelihood approach to nonparametric estimation of a singular distribution using deep generative models.
We prove that a novel and effective solution exists by perturbing the data with an instance noise.
We also characterize the class of distributions that can be efficiently estimated via deep generative models.
arXiv Detail & Related papers (2021-05-09T23:13:58Z) - A similarity-based Bayesian mixture-of-experts model [0.5156484100374058]
We present a new non-parametric mixture-of-experts model for multivariate regression problems.
Using a conditionally specified model, predictions for out-of-sample inputs are based on similarities to each observed data point.
Posterior inference is performed on the parameters of the mixture as well as the distance metric.
arXiv Detail & Related papers (2020-12-03T18:08:30Z) - Batch Stationary Distribution Estimation [98.18201132095066]
We consider the problem of approximating the stationary distribution of an ergodic Markov chain given a set of sampled transitions.
We propose a consistent estimator that is based on recovering a correction ratio function over the given data.
arXiv Detail & Related papers (2020-03-02T09:10:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.