Empirical Density Estimation based on Spline Quasi-Interpolation with
applications to Copulas clustering modeling
- URL: http://arxiv.org/abs/2402.11552v1
- Date: Sun, 18 Feb 2024 11:49:38 GMT
- Title: Empirical Density Estimation based on Spline Quasi-Interpolation with
applications to Copulas clustering modeling
- Authors: Cristiano Tamborrino, Antonella Falini, Francesca Mazzia
- Abstract summary: Density estimation is a fundamental technique employed in various fields to model and to understand the underlying distribution of data.
In this paper we propose the mono-variate approximation of the density using quasi-interpolation.
The presented algorithm is validated on artificial and real datasets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Density estimation is a fundamental technique employed in various fields to
model and to understand the underlying distribution of data. The primary
objective of density estimation is to estimate the probability density function
of a random variable. This process is particularly valuable when dealing with
univariate or multivariate data and is essential for tasks such as clustering,
anomaly detection, and generative modeling. In this paper we propose the
mono-variate approximation of the density using spline quasi interpolation and
we applied it in the context of clustering modeling. The clustering technique
used is based on the construction of suitable multivariate distributions which
rely on the estimation of the monovariate empirical densities (marginals). Such
an approximation is achieved by using the proposed spline quasi-interpolation,
while the joint distributions to model the sought clustering partition is
constructed with the use of copulas functions. In particular, since copulas can
capture the dependence between the features of the data independently from the
marginal distributions, a finite mixture copula model is proposed. The
presented algorithm is validated on artificial and real datasets.
Related papers
- Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis [56.442307356162864]
We study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework.
We introduce a discrete-time sampling algorithm in the general state space $[S]d$ that utilizes score estimators at predefined time points.
Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function.
arXiv Detail & Related papers (2024-10-03T09:07:13Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - PQMass: Probabilistic Assessment of the Quality of Generative Models
using Probability Mass Estimation [8.527898482146103]
We propose a comprehensive sample-based method for assessing the quality of generative models.
The proposed approach enables the estimation of the probability that two sets of samples are drawn from the same distribution.
arXiv Detail & Related papers (2024-02-06T19:39:26Z) - Finite Mixtures of Multivariate Poisson-Log Normal Factor Analyzers for
Clustering Count Data [0.8499685241219366]
A class of eight parsimonious mixture models based on the mixtures of factor analyzers model are introduced.
The proposed models are explored in the context of clustering discrete data arising from RNA sequencing studies.
arXiv Detail & Related papers (2023-11-13T21:23:15Z) - Probabilistic Classification by Density Estimation Using Gaussian
Mixture Model and Masked Autoregressive Flow [1.3706331473063882]
Density estimation, which estimates the distribution of data, is an important category of probabilistic machine learning.
In this paper, we use the density estimators for classification, although they are often used for estimating the distribution of data.
We model the likelihood of classes of data by density estimation, specifically using GMM and MAF.
arXiv Detail & Related papers (2023-10-16T21:37:22Z) - Copula-Based Density Estimation Models for Multivariate Zero-Inflated
Continuous Data [0.0]
We propose two copula-based density estimation models that can cope with multivariate correlation among zero-inflated continuous variables.
In order to overcome the difficulty in the use of copulas due to the tied-data problem in zero-inflated data, we propose a new type of copula, rectified Gaussian copula.
arXiv Detail & Related papers (2023-04-02T13:43:37Z) - Score Approximation, Estimation and Distribution Recovery of Diffusion
Models on Low-Dimensional Data [68.62134204367668]
This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace.
We show that with a properly chosen neural network architecture, the score function can be both accurately approximated and efficiently estimated.
The generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution.
arXiv Detail & Related papers (2023-02-14T17:02:35Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - A Robust and Flexible EM Algorithm for Mixtures of Elliptical
Distributions with Missing Data [71.9573352891936]
This paper tackles the problem of missing data imputation for noisy and non-Gaussian data.
A new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data.
Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data.
arXiv Detail & Related papers (2022-01-28T10:01:37Z) - Density-Based Clustering with Kernel Diffusion [59.4179549482505]
A naive density corresponding to the indicator function of a unit $d$-dimensional Euclidean ball is commonly used in density-based clustering algorithms.
We propose a new kernel diffusion density function, which is adaptive to data of varying local distributional characteristics and smoothness.
arXiv Detail & Related papers (2021-10-11T09:00:33Z) - A likelihood approach to nonparametric estimation of a singular
distribution using deep generative models [4.329951775163721]
We investigate a likelihood approach to nonparametric estimation of a singular distribution using deep generative models.
We prove that a novel and effective solution exists by perturbing the data with an instance noise.
We also characterize the class of distributions that can be efficiently estimated via deep generative models.
arXiv Detail & Related papers (2021-05-09T23:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.