Covariance matrix preparation for quantum principal component analysis
- URL: http://arxiv.org/abs/2204.03495v1
- Date: Thu, 7 Apr 2022 15:11:42 GMT
- Title: Covariance matrix preparation for quantum principal component analysis
- Authors: Max Hunter Gordon, M. Cerezo, Lukasz Cincio, Patrick J. Coles
- Abstract summary: Principal component analysis (PCA) is a dimensionality reduction method in data analysis.
Quantum algorithms have been formulated for PCA based on diagonalizing a density matrix.
We numerically implement our method for molecular ground-state datasets.
- Score: 0.8258451067861933
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Principal component analysis (PCA) is a dimensionality reduction method in
data analysis that involves diagonalizing the covariance matrix of the dataset.
Recently, quantum algorithms have been formulated for PCA based on
diagonalizing a density matrix. These algorithms assume that the covariance
matrix can be encoded in a density matrix, but a concrete protocol for this
encoding has been lacking. Our work aims to address this gap. Assuming
amplitude encoding of the data, with the data given by the ensemble $\{p_i,|
\psi_i \rangle\}$, then one can easily prepare the ensemble average density
matrix $\overline{\rho} = \sum_i p_i |\psi_i\rangle \langle \psi_i |$. We first
show that $\overline{\rho}$ is precisely the covariance matrix whenever the
dataset is centered. For quantum datasets, we exploit global phase symmetry to
argue that there always exists a centered dataset consistent with
$\overline{\rho}$, and hence $\overline{\rho}$ can always be interpreted as a
covariance matrix. This provides a simple means for preparing the covariance
matrix for arbitrary quantum datasets or centered classical datasets. For
uncentered classical datasets, our method is so-called "PCA without centering",
which we interpret as PCA on a symmetrized dataset. We argue that this closely
corresponds to standard PCA, and we derive equations and inequalities that
bound the deviation of the spectrum obtained with our method from that of
standard PCA. We numerically illustrate our method for the MNIST handwritten
digit dataset. We also argue that PCA on quantum datasets is natural and
meaningful, and we numerically implement our method for molecular ground-state
datasets.
Related papers
- A Generalized Mean Approach for Distributed-PCA [0.0]
We propose a novel DPCA method that incorporates eigenvalue information to aggregate local results via the matrix $beta$-mean.
The $beta$-DPCA offers a flexible and robust aggregation through the adjustable choice of $beta$ values.
arXiv Detail & Related papers (2024-10-01T04:39:40Z) - Sparse PCA with Oracle Property [115.72363972222622]
We propose a family of estimators based on the semidefinite relaxation of sparse PCA with novel regularizations.
We prove that, another estimator within the family achieves a sharper statistical rate of convergence than the standard semidefinite relaxation of sparse PCA.
arXiv Detail & Related papers (2023-12-28T02:52:54Z) - Hybrid Model-based / Data-driven Graph Transform for Image Coding [54.31406300524195]
We present a hybrid model-based / data-driven approach to encode an intra-prediction residual block.
The first $K$ eigenvectors of a transform matrix are derived from a statistical model, e.g., the asymmetric discrete sine transform (ADST) for stability.
Using WebP as a baseline image, experimental results show that our hybrid graph transform achieved better energy compaction than default discrete cosine transform (DCT) and better stability than KLT.
arXiv Detail & Related papers (2022-03-02T15:36:44Z) - Leverage Score Sampling for Tensor Product Matrices in Input Sparsity
Time [54.65688986250061]
We give an input sparsity time sampling algorithm for approximating the Gram matrix corresponding to the $q$-fold column-wise tensor product of $q$ matrices.
Our sampling technique relies on a collection of $q$ partially correlated random projections which can be simultaneously applied to a dataset $X$ in total time.
arXiv Detail & Related papers (2022-02-09T15:26:03Z) - Sparse PCA on fixed-rank matrices [0.05076419064097732]
We show that, if the rank of the covariance matrix is a fixed value, then there is an algorithm that solves sparse PCA to global optimality.
We also prove a similar result for the version of sparse PCA which requires the principal components to have disjoint supports.
arXiv Detail & Related papers (2022-01-07T15:05:32Z) - Test Set Sizing Via Random Matrix Theory [91.3755431537592]
This paper uses techniques from Random Matrix Theory to find the ideal training-testing data split for a simple linear regression.
It defines "ideal" as satisfying the integrity metric, i.e. the empirical model error is the actual measurement noise.
This paper is the first to solve for the training and test size for any model in a way that is truly optimal.
arXiv Detail & Related papers (2021-12-11T13:18:33Z) - Unlabeled Principal Component Analysis and Matrix Completion [25.663593359761336]
We introduce robust principal component analysis from a data matrix in which the entries of its columns have been corrupted by permutations.
We derive theory and algorithms of similar flavor for UPCA.
Experiments on synthetic data, face images, educational and medical records reveal the potential of our algorithms for applications such as data privatization and record linkage.
arXiv Detail & Related papers (2021-01-23T07:34:48Z) - A Linearly Convergent Algorithm for Distributed Principal Component
Analysis [12.91948651812873]
This paper introduces a feedforward neural network-based one time-scale distributed PCA algorithm termed Distributed Sanger's Algorithm (DSA)
The proposed algorithm is shown to converge linearly to a neighborhood of the true solution.
arXiv Detail & Related papers (2021-01-05T00:51:14Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - List-Decodable Mean Estimation in Nearly-PCA Time [50.79691056481693]
We study the fundamental task of list-decodable mean estimation in high dimensions.
Our algorithm runs in time $widetildeO(ndk)$ for all $k = O(sqrtd) cup Omega(d)$, where $n$ is the size of the dataset.
A variant of our algorithm has runtime $widetildeO(ndk)$ for all $k$, at the expense of an $O(sqrtlog k)$ factor in the recovery guarantee
arXiv Detail & Related papers (2020-11-19T17:21:37Z) - A New Basis for Sparse Principal Component Analysis [5.258928416164104]
Previous versions of sparse principal component analysis presumed that the eigen-basis (a $p times k$ matrix) is approximately sparse.
We propose a method that presumes the $p times k$ matrix becomes approximately sparse after a $k times k$ rotation.
We show that for the same level of sparsity, the proposed sparse PCA method is more stable and can explain more variance compared to alternative methods.
arXiv Detail & Related papers (2020-07-01T16:32:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.