Adaptative clustering by minimization of the mixing entropy criterion
- URL: http://arxiv.org/abs/2203.11517v1
- Date: Tue, 22 Mar 2022 07:47:02 GMT
- Title: Adaptative clustering by minimization of the mixing entropy criterion
- Authors: Thierry Dumont (UPN, FP2M, MODAL'X)
- Abstract summary: We present a clustering method and provide an explanation to a phenomenon encountered in the applied statistical literature since the 1990's.
This phenomenon is the natural adaptability of the order when using a clustering method derived from the famous EM algorithm.
We define a new statistic, the relative entropic order, that represents the number of clumps in the target distribution.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a clustering method and provide a theoretical analysis and an
explanation to a phenomenon encountered in the applied statistical literature
since the 1990's. This phenomenon is the natural adaptability of the order when
using a clustering method derived from the famous EM algorithm. We define a new
statistic, the relative entropic order, that represents the number of clumps in
the target distribution. We prove in particular that the empirical version of
this relative entropic order is consistent. Our approach is easy to implement
and has a high potential of applications. Perspectives of this works are
algorithmic and theoretical, with possible natural extensions to various cases
such as dependent or multidimensional data.
Related papers
- Empirical Density Estimation based on Spline Quasi-Interpolation with
applications to Copulas clustering modeling [0.0]
Density estimation is a fundamental technique employed in various fields to model and to understand the underlying distribution of data.
In this paper we propose the mono-variate approximation of the density using quasi-interpolation.
The presented algorithm is validated on artificial and real datasets.
arXiv Detail & Related papers (2024-02-18T11:49:38Z) - Kernel Biclustering algorithm in Hilbert Spaces [8.303238963864885]
We develop a new model-free biclustering algorithm in abstract spaces using the notions of energy distance and the maximum mean discrepancy.
The proposed method can learn more general and complex cluster shapes than most existing literature approaches.
Our results are similar to state-of-the-art methods in their optimal scenarios, assuming a proper kernel choice.
arXiv Detail & Related papers (2022-08-07T08:41:46Z) - Optimal regularizations for data generation with probabilistic graphical
models [0.0]
Empirically, well-chosen regularization schemes dramatically improve the quality of the inferred models.
We consider the particular case of L 2 and L 1 regularizations in the Maximum A Posteriori (MAP) inference of generative pairwise graphical models.
arXiv Detail & Related papers (2021-12-02T14:45:16Z) - Partial Counterfactual Identification from Observational and
Experimental Data [83.798237968683]
We develop effective Monte Carlo algorithms to approximate the optimal bounds from an arbitrary combination of observational and experimental data.
Our algorithms are validated extensively on synthetic and real-world datasets.
arXiv Detail & Related papers (2021-10-12T02:21:30Z) - Discovering Latent Causal Variables via Mechanism Sparsity: A New
Principle for Nonlinear ICA [81.4991350761909]
Independent component analysis (ICA) refers to an ensemble of methods which formalize this goal and provide estimation procedure for practical application.
We show that the latent variables can be recovered up to a permutation if one regularizes the latent mechanisms to be sparse.
arXiv Detail & Related papers (2021-07-21T14:22:14Z) - Leveraging Global Parameters for Flow-based Neural Posterior Estimation [90.21090932619695]
Inferring the parameters of a model based on experimental observations is central to the scientific method.
A particularly challenging setting is when the model is strongly indeterminate, i.e., when distinct sets of parameters yield identical observations.
We present a method for cracking such indeterminacy by exploiting additional information conveyed by an auxiliary set of observations sharing global parameters.
arXiv Detail & Related papers (2021-02-12T12:23:13Z) - Statistical optimality and stability of tangent transform algorithms in
logit models [6.9827388859232045]
We provide conditions on the data generating process to derive non-asymptotic upper bounds to the risk incurred by the logistical optima.
In particular, we establish local variation of the algorithm without any assumptions on the data-generating process.
We explore a special case involving a semi-orthogonal design under which a global convergence is obtained.
arXiv Detail & Related papers (2020-10-25T05:15:13Z) - Understanding Implicit Regularization in Over-Parameterized Single Index
Model [55.41685740015095]
We design regularization-free algorithms for the high-dimensional single index model.
We provide theoretical guarantees for the induced implicit regularization phenomenon.
arXiv Detail & Related papers (2020-07-16T13:27:47Z) - Asymptotic Analysis of an Ensemble of Randomly Projected Linear
Discriminants [94.46276668068327]
In [1], an ensemble of randomly projected linear discriminants is used to classify datasets.
We develop a consistent estimator of the misclassification probability as an alternative to the computationally-costly cross-validation estimator.
We also demonstrate the use of our estimator for tuning the projection dimension on both real and synthetic data.
arXiv Detail & Related papers (2020-04-17T12:47:04Z) - Generalized Sliced Distances for Probability Distributions [47.543990188697734]
We introduce a broad family of probability metrics, coined as Generalized Sliced Probability Metrics (GSPMs)
GSPMs are rooted in the generalized Radon transform and come with a unique geometric interpretation.
We consider GSPM-based gradient flows for generative modeling applications and show that under mild assumptions, the gradient flow converges to the global optimum.
arXiv Detail & Related papers (2020-02-28T04:18:00Z) - Fast approximations in the homogeneous Ising model for use in scene
analysis [61.0951285821105]
We provide accurate approximations that make it possible to numerically calculate quantities needed in inference.
We show that our approximation formulae are scalable and unfazed by the size of the Markov Random Field.
The practical import of our approximation formulae is illustrated in performing Bayesian inference in a functional Magnetic Resonance Imaging activation detection experiment, and also in likelihood ratio testing for anisotropy in the spatial patterns of yearly increases in pistachio tree yields.
arXiv Detail & Related papers (2017-12-06T14:24:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.