Density estimation with atoms, and functional estimation for mixed discrete-continuous data
- URL: http://arxiv.org/abs/2508.01706v1
- Date: Sun, 03 Aug 2025 10:22:35 GMT
- Title: Density estimation with atoms, and functional estimation for mixed discrete-continuous data
- Authors: Aytijhya Saha, Aaditya Ramdas,
- Abstract summary: In density-functional estimation, it is standard to assume that the underlying distribution has a density with respect to the Lebesgue measure.<n>When the data distribution is a mixture of continuous and discrete components, the resulting methods are inconsistent in theory and perform poorly in practice.<n>We modify existing estimators for a broad class of functionals of the continuous component of the mixture.
- Score: 29.43493007296859
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In classical density (or density-functional) estimation, it is standard to assume that the underlying distribution has a density with respect to the Lebesgue measure. However, when the data distribution is a mixture of continuous and discrete components, the resulting methods are inconsistent in theory and perform poorly in practice. In this paper, we point out that a minor modification of existing methods for nonparametric density (functional) estimation can allow us to fully remove this assumption while retaining nearly identical theoretical guarantees and improved empirical performance. Our approach is very simple: data points that appear exactly once are likely to originate from the continuous component, whereas repeated observations are indicative of the discrete part. Leveraging this observation, we modify existing estimators for a broad class of functionals of the continuous component of the mixture; this modification is a "wrapper" in the sense that the user can use any underlying method of their choice for continuous density functional estimation. Our modifications deliver consistency without requiring knowledge of the discrete support, the mixing proportion, and without imposing additional assumptions beyond those needed in the absence of the discrete part. Thus, various theorems and existing software packages can be made automatically more robust, with absolutely no additional price when the data is not truly mixed.
Related papers
- Regulating Model Reliance on Non-Robust Features by Smoothing Input Marginal Density [93.32594873253534]
Trustworthy machine learning requires meticulous regulation of model reliance on non-robust features.
We propose a framework to delineate and regulate such features by attributing model predictions to the input.
arXiv Detail & Related papers (2024-07-05T09:16:56Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Mixed Variational Flows for Discrete Variables [14.00384446902181]
We develop a variational flow family for discrete distributions without any continuous embedding.
First, we develop a measure-preserving and discrete (MAD) invertible map that leaves the discrete target invariant.
We also develop an extension to MAD Mix that handles joint discrete and continuous models.
arXiv Detail & Related papers (2023-08-29T20:13:37Z) - Sobolev Space Regularised Pre Density Models [51.558848491038916]
We propose a new approach to non-parametric density estimation that is based on regularizing a Sobolev norm of the density.
This method is statistically consistent, and makes the inductive validation model clear and consistent.
arXiv Detail & Related papers (2023-07-25T18:47:53Z) - Anomaly Detection with Variance Stabilized Density Estimation [49.46356430493534]
We present a variance-stabilized density estimation problem for maximizing the likelihood of the observed samples.
To obtain a reliable anomaly detector, we introduce a spectral ensemble of autoregressive models for learning the variance-stabilized distribution.
We have conducted an extensive benchmark with 52 datasets, demonstrating that our method leads to state-of-the-art results.
arXiv Detail & Related papers (2023-06-01T11:52:58Z) - Copula-Based Density Estimation Models for Multivariate Zero-Inflated
Continuous Data [0.0]
We propose two copula-based density estimation models that can cope with multivariate correlation among zero-inflated continuous variables.
In order to overcome the difficulty in the use of copulas due to the tied-data problem in zero-inflated data, we propose a new type of copula, rectified Gaussian copula.
arXiv Detail & Related papers (2023-04-02T13:43:37Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Nonparametric Probabilistic Regression with Coarse Learners [1.8275108630751844]
We show that we can compute precise conditional densities with minimal assumptions on the shape or form of the density.
We demonstrate this approach on a variety of datasets and show competitive performance, particularly on larger datasets.
arXiv Detail & Related papers (2022-10-28T16:25:26Z) - Interpretable Mixture Density Estimation by use of Differentiable
Tree-module [0.0]
We propose a method for mixture density estimation that utilizes an interpretable tree structure.
A fast inference procedure based on time-invariant information cache achieves both high speed and interpretability.
arXiv Detail & Related papers (2021-05-08T07:29:58Z) - Contrastive learning of strong-mixing continuous-time stochastic
processes [53.82893653745542]
Contrastive learning is a family of self-supervised methods where a model is trained to solve a classification task constructed from unlabeled data.
We show that a properly constructed contrastive learning task can be used to estimate the transition kernel for small-to-mid-range intervals in the diffusion case.
arXiv Detail & Related papers (2021-03-03T23:06:47Z) - Improving Nonparametric Density Estimation with Tensor Decompositions [14.917420021212912]
Nonparametric density estimators often perform well on low dimensional data, but suffer when applied to higher dimensional data.
This paper investigates whether these improvements can be extended to other simplified dependence assumptions.
We prove that restricting estimation to low-rank nonnegative PARAFAC or Tucker decompositions removes the dimensionality exponent on bin width rates for multidimensional histograms.
arXiv Detail & Related papers (2020-10-06T01:39:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.