The UU-test for Statistical Modeling of Unimodal Data
- URL: http://arxiv.org/abs/2008.12537v2
- Date: Thu, 9 Sep 2021 15:28:00 GMT
- Title: The UU-test for Statistical Modeling of Unimodal Data
- Authors: Paraskevi Chasani and Aristidis Likas
- Abstract summary: We propose a technique called UU-test (Unimodal Uniform test) to decide on the unimodality of a one-dimensional dataset.
A unique feature of this approach is that in the case of unimodality, it also provides a statistical model of the data in the form of a Uniform Mixture Model.
- Score: 0.20305676256390928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deciding on the unimodality of a dataset is an important problem in data
analysis and statistical modeling. It allows to obtain knowledge about the
structure of the dataset, ie. whether data points have been generated by a
probability distribution with a single or more than one peaks. Such knowledge
is very useful for several data analysis problems, such as for deciding on the
number of clusters and determining unimodal projections. We propose a technique
called UU-test (Unimodal Uniform test) to decide on the unimodality of a
one-dimensional dataset. The method operates on the empirical cumulative
density function (ecdf) of the dataset. It attempts to build a piecewise linear
approximation of the ecdf that is unimodal and models the data sufficiently in
the sense that the data corresponding to each linear segment follows the
uniform distribution. A unique feature of this approach is that in the case of
unimodality, it also provides a statistical model of the data in the form of a
Uniform Mixture Model. We present experimental results in order to assess the
ability of the method to decide on unimodality and perform comparisons with the
well-known dip-test approach. In addition, in the case of unimodal datasets we
evaluate the Uniform Mixture Models provided by the proposed method using the
test set log-likelihood and the two-sample Kolmogorov-Smirnov (KS) test.
Related papers
- Unified Convergence Analysis for Score-Based Diffusion Models with Deterministic Samplers [49.1574468325115]
We introduce a unified convergence analysis framework for deterministic samplers.
Our framework achieves iteration complexity of $tilde O(d2/epsilon)$.
We also provide a detailed analysis of Denoising Implicit Diffusion Models (DDIM)-type samplers.
arXiv Detail & Related papers (2024-10-18T07:37:36Z) - Diffusion posterior sampling for simulation-based inference in tall data settings [53.17563688225137]
Simulation-based inference ( SBI) is capable of approximating the posterior distribution that relates input parameters to a given observation.
In this work, we consider a tall data extension in which multiple observations are available to better infer the parameters of the model.
We compare our method to recently proposed competing approaches on various numerical experiments and demonstrate its superiority in terms of numerical stability and computational cost.
arXiv Detail & Related papers (2024-04-11T09:23:36Z) - Empirical Density Estimation based on Spline Quasi-Interpolation with
applications to Copulas clustering modeling [0.0]
Density estimation is a fundamental technique employed in various fields to model and to understand the underlying distribution of data.
In this paper we propose the mono-variate approximation of the density using quasi-interpolation.
The presented algorithm is validated on artificial and real datasets.
arXiv Detail & Related papers (2024-02-18T11:49:38Z) - PQMass: Probabilistic Assessment of the Quality of Generative Models
using Probability Mass Estimation [8.527898482146103]
We propose a comprehensive sample-based method for assessing the quality of generative models.
The proposed approach enables the estimation of the probability that two sets of samples are drawn from the same distribution.
arXiv Detail & Related papers (2024-02-06T19:39:26Z) - A Multivariate Unimodality Test Harnessing the Dip Statistic of Mahalanobis Distances Over Random Projections [0.18416014644193066]
We extend one-dimensional unimodality principles to multi-dimensional spaces through linear random projections and point-to-point distancing.
Our method, rooted in $alpha$-unimodality assumptions, presents a novel unimodality test named mud-pod.
Both theoretical and empirical studies confirm the efficacy of our method in unimodality assessment of multidimensional datasets.
arXiv Detail & Related papers (2023-11-28T09:11:02Z) - Learning to Bound Counterfactual Inference in Structural Causal Models
from Observational and Randomised Data [64.96984404868411]
We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm.
The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources.
It delivers interval approximations to counterfactual results, which collapse to points in the identifiable case.
arXiv Detail & Related papers (2022-12-06T12:42:11Z) - Data-IQ: Characterizing subgroups with heterogeneous outcomes in tabular
data [81.43750358586072]
We propose Data-IQ, a framework to systematically stratify examples into subgroups with respect to their outcomes.
We experimentally demonstrate the benefits of Data-IQ on four real-world medical datasets.
arXiv Detail & Related papers (2022-10-24T08:57:55Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Sampling from Arbitrary Functions via PSD Models [55.41644538483948]
We take a two-step approach by first modeling the probability distribution and then sampling from that model.
We show that these models can approximate a large class of densities concisely using few evaluations, and present a simple algorithm to effectively sample from these models.
arXiv Detail & Related papers (2021-10-20T12:25:22Z) - A Subsampling-Based Method for Causal Discovery on Discrete Data [18.35147325731821]
In this work, we propose a subsampling-based method to test the independence between the generating schemes of the cause and that of the mechanism.
Our methodology works for both discrete and categorical data and does not imply any functional model on the data, making it a more flexible approach.
arXiv Detail & Related papers (2021-08-31T17:11:58Z) - Multimodal Data Fusion in High-Dimensional Heterogeneous Datasets via
Generative Models [16.436293069942312]
We are interested in learning probabilistic generative models from high-dimensional heterogeneous data in an unsupervised fashion.
We propose a general framework that combines disparate data types through the exponential family of distributions.
The proposed algorithm is presented in detail for the commonly encountered heterogeneous datasets with real-valued (Gaussian) and categorical (multinomial) features.
arXiv Detail & Related papers (2021-08-27T18:10:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.