Clustering consistency with Dirichlet process mixtures
- URL: http://arxiv.org/abs/2205.12924v1
- Date: Wed, 25 May 2022 17:21:42 GMT
- Title: Clustering consistency with Dirichlet process mixtures
- Authors: Filippo Ascolani, Antonio Lijoi, Giovanni Rebaudo, Giacomo Zanella
- Abstract summary: We study the posterior distribution induced by Dirichlet process mixtures as the sample size increases.
We show that consistency for the number of clusters can be achieved if the concentration parameter is adapted in a fully Bayesian way.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dirichlet process mixtures are flexible non-parametric models, particularly
suited to density estimation and probabilistic clustering. In this work we
study the posterior distribution induced by Dirichlet process mixtures as the
sample size increases, and more specifically focus on consistency for the
unknown number of clusters when the observed data are generated from a finite
mixture. Crucially, we consider the situation where a prior is placed on the
concentration parameter of the underlying Dirichlet process. Previous findings
in the literature suggest that Dirichlet process mixtures are typically not
consistent for the number of clusters if the concentration parameter is held
fixed and data come from a finite mixture. Here we show that consistency for
the number of clusters can be achieved if the concentration parameter is
adapted in a fully Bayesian way, as commonly done in practice. Our results are
derived for data coming from a class of finite mixtures, with mild assumptions
on the prior for the concentration parameter and for a variety of choices of
likelihood kernels for the mixture.
Related papers
- Summarizing Bayesian Nonparametric Mixture Posterior -- Sliced Optimal Transport Metrics for Gaussian Mixtures [10.694077392690447]
Existing methods to summarize posterior inference for mixture models focus on identifying a point estimate of the implied random partition for clustering.
We propose a novel approach for summarizing posterior inference in nonparametric Bayesian mixture models, prioritizing density estimation of the mixing measure (or mixture) as an inference target.
arXiv Detail & Related papers (2024-11-22T02:15:38Z) - Clustering Based on Density Propagation and Subcluster Merging [92.15924057172195]
We propose a density-based node clustering approach that automatically determines the number of clusters and can be applied in both data space and graph space.
Unlike traditional density-based clustering methods, which necessitate calculating the distance between any two nodes, our proposed technique determines density through a propagation process.
arXiv Detail & Related papers (2024-11-04T04:09:36Z) - Empirical Density Estimation based on Spline Quasi-Interpolation with
applications to Copulas clustering modeling [0.0]
Density estimation is a fundamental technique employed in various fields to model and to understand the underlying distribution of data.
In this paper we propose the mono-variate approximation of the density using quasi-interpolation.
The presented algorithm is validated on artificial and real datasets.
arXiv Detail & Related papers (2024-02-18T11:49:38Z) - Semantic Equivariant Mixup [54.734054770032934]
Mixup is a well-established data augmentation technique, which can extend the training distribution and regularize the neural networks.
Previous mixup variants tend to over-focus on the label-related information.
We propose a semantic equivariant mixup (sem) to preserve richer semantic information in the input.
arXiv Detail & Related papers (2023-08-12T03:05:53Z) - Joint Probability Estimation Using Tensor Decomposition and Dictionaries [3.4720326275851994]
We study non-parametric estimation of joint probabilities of a given set of discrete and continuous random variables from their (empirically estimated) 2D marginals.
We create a dictionary of various families of distributions by inspecting the data, and use it to approximate each decomposed factor of the product in the mixture.
arXiv Detail & Related papers (2022-03-03T11:55:51Z) - A Robust and Flexible EM Algorithm for Mixtures of Elliptical
Distributions with Missing Data [71.9573352891936]
This paper tackles the problem of missing data imputation for noisy and non-Gaussian data.
A new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data.
Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data.
arXiv Detail & Related papers (2022-01-28T10:01:37Z) - Density Ratio Estimation via Infinitesimal Classification [85.08255198145304]
We propose DRE-infty, a divide-and-conquer approach to reduce Density ratio estimation (DRE) to a series of easier subproblems.
Inspired by Monte Carlo methods, we smoothly interpolate between the two distributions via an infinite continuum of intermediate bridge distributions.
We show that our approach performs well on downstream tasks such as mutual information estimation and energy-based modeling on complex, high-dimensional datasets.
arXiv Detail & Related papers (2021-11-22T06:26:29Z) - Consistent Estimation of Identifiable Nonparametric Mixture Models from
Grouped Observations [84.81435917024983]
This work proposes an algorithm that consistently estimates any identifiable mixture model from grouped observations.
A practical implementation is provided for paired observations, and the approach is shown to outperform existing methods.
arXiv Detail & Related papers (2020-06-12T20:44:22Z) - Uniform Convergence Rates for Maximum Likelihood Estimation under
Two-Component Gaussian Mixture Models [13.769786711365104]
We derive uniform convergence rates for the maximum likelihood estimator and minimax lower bounds for parameter estimation.
We assume the mixing proportions of the mixture are known and fixed, but make no separation assumption on the underlying mixture components.
arXiv Detail & Related papers (2020-06-01T04:13:48Z) - Algebraic and Analytic Approaches for Parameter Learning in Mixture
Models [66.96778152993858]
We present two different approaches for parameter learning in several mixture models in one dimension.
For some of these distributions, our results represent the first guarantees for parameter estimation.
arXiv Detail & Related papers (2020-01-19T05:10:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.