Consistency of archetypal analysis
- URL: http://arxiv.org/abs/2010.08148v2
- Date: Mon, 19 Oct 2020 14:11:38 GMT
- Title: Consistency of archetypal analysis
- Authors: Braxton Osting, Dong Wang, Yiming Xu and Dominique Zosso
- Abstract summary: Archetypal analysis is an unsupervised learning method that uses a convex polytope to summarize multivariate data.
In this paper, we prove a consistency result that shows if the data is independently sampled from a probability measure with bounded support.
We also obtain the convergence rate of the optimal objective values under appropriate assumptions on the distribution.
- Score: 10.424626933990272
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Archetypal analysis is an unsupervised learning method that uses a convex
polytope to summarize multivariate data. For fixed $k$, the method finds a
convex polytope with $k$ vertices, called archetype points, such that the
polytope is contained in the convex hull of the data and the mean squared
distance between the data and the polytope is minimal. In this paper, we prove
a consistency result that shows if the data is independently sampled from a
probability measure with bounded support, then the archetype points converge to
a solution of the continuum version of the problem, of which we identify and
establish several properties. We also obtain the convergence rate of the
optimal objective values under appropriate assumptions on the distribution. If
the data is independently sampled from a distribution with unbounded support,
we also prove a consistency result for a modified method that penalizes the
dispersion of the archetype points. Our analysis is supported by detailed
computational experiments of the archetype points for data sampled from the
uniform distribution in a disk, the normal distribution, an annular
distribution, and a Gaussian mixture model.
Related papers
- Theory on Score-Mismatched Diffusion Models and Zero-Shot Conditional Samplers [49.97755400231656]
We present the first performance guarantee with explicit dimensional general score-mismatched diffusion samplers.
We show that score mismatches result in an distributional bias between the target and sampling distributions, proportional to the accumulated mismatch between the target and training distributions.
This result can be directly applied to zero-shot conditional samplers for any conditional model, irrespective of measurement noise.
arXiv Detail & Related papers (2024-10-17T16:42:12Z) - Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis [56.442307356162864]
We study the theoretical aspects of score-based discrete diffusion models under the Continuous Time Markov Chain (CTMC) framework.
We introduce a discrete-time sampling algorithm in the general state space $[S]d$ that utilizes score estimators at predefined time points.
Our convergence analysis employs a Girsanov-based method and establishes key properties of the discrete score function.
arXiv Detail & Related papers (2024-10-03T09:07:13Z) - Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Empirical Density Estimation based on Spline Quasi-Interpolation with
applications to Copulas clustering modeling [0.0]
Density estimation is a fundamental technique employed in various fields to model and to understand the underlying distribution of data.
In this paper we propose the mono-variate approximation of the density using quasi-interpolation.
The presented algorithm is validated on artificial and real datasets.
arXiv Detail & Related papers (2024-02-18T11:49:38Z) - Classification of Heavy-tailed Features in High Dimensions: a
Superstatistical Approach [1.4469725791865984]
We characterise the learning of a mixture of two clouds of data points with generic centroids.
We study the generalisation performance of the obtained estimator, we analyse the role of regularisation, and we analytically the separability transition.
arXiv Detail & Related papers (2023-04-06T07:53:05Z) - Mean-Square Analysis of Discretized It\^o Diffusions for Heavy-tailed
Sampling [17.415391025051434]
We analyze the complexity of sampling from a class of heavy-tailed distributions by discretizing a natural class of Ito diffusions associated with weighted Poincar'e inequalities.
Based on a mean-square analysis, we establish the iteration complexity for obtaining a sample whose distribution is $epsilon$ close to the target distribution in the Wasserstein-2 metric.
arXiv Detail & Related papers (2023-03-01T15:16:03Z) - Score Approximation, Estimation and Distribution Recovery of Diffusion
Models on Low-Dimensional Data [68.62134204367668]
This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace.
We show that with a properly chosen neural network architecture, the score function can be both accurately approximated and efficiently estimated.
The generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution.
arXiv Detail & Related papers (2023-02-14T17:02:35Z) - Data thinning for convolution-closed distributions [2.299914829977005]
We propose data thinning, an approach for splitting an observation into two or more independent parts that sum to the original observation.
We show that data thinning can be used to validate the results of unsupervised learning approaches.
arXiv Detail & Related papers (2023-01-18T02:47:41Z) - Wasserstein Archetypal Analysis [9.54262011088777]
Archetypal analysis is an unsupervised machine learning method that summarizes data using a convex polytope.
We consider an alternative formulation of archetypal analysis based on the Wasserstein metric.
arXiv Detail & Related papers (2022-10-25T19:50:09Z) - A Robust and Flexible EM Algorithm for Mixtures of Elliptical
Distributions with Missing Data [71.9573352891936]
This paper tackles the problem of missing data imputation for noisy and non-Gaussian data.
A new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data.
Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data.
arXiv Detail & Related papers (2022-01-28T10:01:37Z) - Local versions of sum-of-norms clustering [77.34726150561087]
We show that our method can separate arbitrarily close balls in the ball model.
We prove a quantitative bound on the error incurred in the clustering of disjoint connected sets.
arXiv Detail & Related papers (2021-09-20T14:45:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.