Statistical Modeling of Univariate Multimodal Data
- URL: http://arxiv.org/abs/2412.15894v1
- Date: Fri, 20 Dec 2024 13:49:15 GMT
- Title: Statistical Modeling of Univariate Multimodal Data
- Authors: Paraskevi Chasani, Aristidis Likas,
- Abstract summary: We propose a method that partitions unimodal data into unimodal subsets.
For valley point detection, we introduce properties of critical points on the convex hull of the empirical cumulative density function.
Next, we apply a unimodal data modeling approach that provides a statistical model for each obtained unimodal subset.
- Score: 0.0
- License:
- Abstract: Unimodality constitutes a key property indicating grouping behavior of the data around a single mode of its density. We propose a method that partitions univariate data into unimodal subsets through recursive splitting around valley points of the data density. For valley point detection, we introduce properties of critical points on the convex hull of the empirical cumulative density function (ecdf) plot that provide indications on the existence of density valleys. Next, we apply a unimodal data modeling approach that provides a statistical model for each obtained unimodal subset in the form of a Uniform Mixture Model (UMM). Consequently, a hierarchical statistical model of the initial dataset is obtained in the form of a mixture of UMMs, named as the Unimodal Mixture Model (UDMM). The proposed method is non-parametric, hyperparameter-free, automatically estimates the number of unimodal subsets and provides accurate statistical models as indicated by experimental results on clustering and density estimation tasks.
Related papers
- Unified Convergence Analysis for Score-Based Diffusion Models with Deterministic Samplers [49.1574468325115]
We introduce a unified convergence analysis framework for deterministic samplers.
Our framework achieves iteration complexity of $tilde O(d2/epsilon)$.
We also provide a detailed analysis of Denoising Implicit Diffusion Models (DDIM)-type samplers.
arXiv Detail & Related papers (2024-10-18T07:37:36Z) - Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis [56.442307356162864]
We study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework.
We introduce a discrete-time sampling algorithm in the general state space $[S]d$ that utilizes score estimators at predefined time points.
Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function.
arXiv Detail & Related papers (2024-10-03T09:07:13Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Empirical Density Estimation based on Spline Quasi-Interpolation with
applications to Copulas clustering modeling [0.0]
Density estimation is a fundamental technique employed in various fields to model and to understand the underlying distribution of data.
In this paper we propose the mono-variate approximation of the density using quasi-interpolation.
The presented algorithm is validated on artificial and real datasets.
arXiv Detail & Related papers (2024-02-18T11:49:38Z) - PQMass: Probabilistic Assessment of the Quality of Generative Models
using Probability Mass Estimation [8.527898482146103]
We propose a comprehensive sample-based method for assessing the quality of generative models.
The proposed approach enables the estimation of the probability that two sets of samples are drawn from the same distribution.
arXiv Detail & Related papers (2024-02-06T19:39:26Z) - A Multivariate Unimodality Test Harnessing the Dip Statistic of Mahalanobis Distances Over Random Projections [0.18416014644193066]
We extend one-dimensional unimodality principles to multi-dimensional spaces through linear random projections and point-to-point distancing.
Our method, rooted in $alpha$-unimodality assumptions, presents a novel unimodality test named mud-pod.
Both theoretical and empirical studies confirm the efficacy of our method in unimodality assessment of multidimensional datasets.
arXiv Detail & Related papers (2023-11-28T09:11:02Z) - Anomaly Detection with Variance Stabilized Density Estimation [49.46356430493534]
We present a variance-stabilized density estimation problem for maximizing the likelihood of the observed samples.
To obtain a reliable anomaly detector, we introduce a spectral ensemble of autoregressive models for learning the variance-stabilized distribution.
We have conducted an extensive benchmark with 52 datasets, demonstrating that our method leads to state-of-the-art results.
arXiv Detail & Related papers (2023-06-01T11:52:58Z) - LEAN-DMKDE: Quantum Latent Density Estimation for Anomaly Detection [0.0]
The method combines an autoencoder, for learning a low-dimensional representation of the data, with a density-estimation model.
The method predicts a degree of normality for new samples based on the estimated density.
The experimental results show that the method performs on par with or outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2022-11-15T21:51:42Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Entropy Minimizing Matrix Factorization [102.26446204624885]
Nonnegative Matrix Factorization (NMF) is a widely-used data analysis technique, and has yielded impressive results in many real-world tasks.
In this study, an Entropy Minimizing Matrix Factorization framework (EMMF) is developed to tackle the above problem.
Considering that the outliers are usually much less than the normal samples, a new entropy loss function is established for matrix factorization.
arXiv Detail & Related papers (2021-03-24T21:08:43Z) - The UU-test for Statistical Modeling of Unimodal Data [0.20305676256390928]
We propose a technique called UU-test (Unimodal Uniform test) to decide on the unimodality of a one-dimensional dataset.
A unique feature of this approach is that in the case of unimodality, it also provides a statistical model of the data in the form of a Uniform Mixture Model.
arXiv Detail & Related papers (2020-08-28T08:34:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.