Two derivations of Principal Component Analysis on datasets of
distributions
- URL: http://arxiv.org/abs/2306.13503v1
- Date: Fri, 23 Jun 2023 14:00:14 GMT
- Title: Two derivations of Principal Component Analysis on datasets of
distributions
- Authors: Vlad Niculae
- Abstract summary: We formulate Principal Component Analysis (PCA) over datasets consisting not of points but of distributions.
Just like the usual PCA on points can be equivalently derived via a variance-maximization principle and via a minimization of reconstruction error, we derive a closed-form solution for distributional PCA.
- Score: 15.635370717421017
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this brief note, we formulate Principal Component Analysis (PCA) over
datasets consisting not of points but of distributions, characterized by their
location and covariance. Just like the usual PCA on points can be equivalently
derived via a variance-maximization principle and via a minimization of
reconstruction error, we derive a closed-form solution for distributional PCA
from both of these perspectives.
Related papers
- From explained variance of correlated components to PCA without
orthogonality constraints [0.0]
Block Principal Component Analysis (Block PCA) of a data matrix A is difficult to use for the design of sparse PCA by 1 regularization.
We introduce new objective matrix functions expvar(Y) which measure the part of the variance of the data matrix A explained by correlated components Y = AZ.
arXiv Detail & Related papers (2024-02-07T09:32:32Z) - Empirical Bayes Covariance Decomposition, and a solution to the Multiple
Tuning Problem in Sparse PCA [2.5382095320488673]
Sparse Principal Components Analysis (PCA) has been proposed as a way to improve both interpretability and reliability of PCA.
We present a solution to the "multiple tuning problem" using Empirical Bayes methods.
arXiv Detail & Related papers (2023-12-06T04:00:42Z) - Classification of Heavy-tailed Features in High Dimensions: a
Superstatistical Approach [1.4469725791865984]
We characterise the learning of a mixture of two clouds of data points with generic centroids.
We study the generalisation performance of the obtained estimator, we analyse the role of regularisation, and we analytically the separability transition.
arXiv Detail & Related papers (2023-04-06T07:53:05Z) - Eigen Analysis of Self-Attention and its Reconstruction from Partial
Computation [58.80806716024701]
We study the global structure of attention scores computed using dot-product based self-attention.
We find that most of the variation among attention scores lie in a low-dimensional eigenspace.
We propose to compute scores only for a partial subset of token pairs, and use them to estimate scores for the remaining pairs.
arXiv Detail & Related papers (2021-06-16T14:38:42Z) - A Linearly Convergent Algorithm for Distributed Principal Component
Analysis [12.91948651812873]
This paper introduces a feedforward neural network-based one time-scale distributed PCA algorithm termed Distributed Sanger's Algorithm (DSA)
The proposed algorithm is shown to converge linearly to a neighborhood of the true solution.
arXiv Detail & Related papers (2021-01-05T00:51:14Z) - Supervised PCA: A Multiobjective Approach [70.99924195791532]
Methods for supervised principal component analysis (SPCA)
We propose a new method for SPCA that addresses both of these objectives jointly.
Our approach accommodates arbitrary supervised learning losses and, through a statistical reformulation, provides a novel low-rank extension of generalized linear models.
arXiv Detail & Related papers (2020-11-10T18:46:58Z) - Consistency of archetypal analysis [10.424626933990272]
Archetypal analysis is an unsupervised learning method that uses a convex polytope to summarize multivariate data.
In this paper, we prove a consistency result that shows if the data is independently sampled from a probability measure with bounded support.
We also obtain the convergence rate of the optimal objective values under appropriate assumptions on the distribution.
arXiv Detail & Related papers (2020-10-16T04:07:26Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z) - Repulsive Mixture Models of Exponential Family PCA for Clustering [127.90219303669006]
The mixture extension of exponential family principal component analysis ( EPCA) was designed to encode much more structural information about data distribution than the traditional EPCA.
The traditional mixture of local EPCAs has the problem of model redundancy, i.e., overlaps among mixing components, which may cause ambiguity for data clustering.
In this paper, a repulsiveness-encouraging prior is introduced among mixing components and a diversified EPCA mixture (DEPCAM) model is developed in the Bayesian framework.
arXiv Detail & Related papers (2020-04-07T04:07:29Z) - Batch Stationary Distribution Estimation [98.18201132095066]
We consider the problem of approximating the stationary distribution of an ergodic Markov chain given a set of sampled transitions.
We propose a consistent estimator that is based on recovering a correction ratio function over the given data.
arXiv Detail & Related papers (2020-03-02T09:10:01Z) - Few-shot Domain Adaptation by Causal Mechanism Transfer [107.08605582020866]
We study few-shot supervised domain adaptation (DA) for regression problems, where only a few labeled target domain data and many labeled source domain data are available.
Many of the current DA methods base their transfer assumptions on either parametrized distribution shift or apparent distribution similarities.
We propose mechanism transfer, a meta-distributional scenario in which a data generating mechanism is invariant among domains.
arXiv Detail & Related papers (2020-02-10T02:16:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.