A Linearly Convergent Algorithm for Distributed Principal Component
Analysis
- URL: http://arxiv.org/abs/2101.01300v1
- Date: Tue, 5 Jan 2021 00:51:14 GMT
- Title: A Linearly Convergent Algorithm for Distributed Principal Component
Analysis
- Authors: Arpita Gang and Waheed U. Bajwa
- Abstract summary: This paper introduces a feedforward neural network-based one time-scale distributed PCA algorithm termed Distributed Sanger's Algorithm (DSA)
The proposed algorithm is shown to converge linearly to a neighborhood of the true solution.
- Score: 12.91948651812873
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Principal Component Analysis (PCA) is the workhorse tool for dimensionality
reduction in this era of big data. While often overlooked, the purpose of PCA
is not only to reduce data dimensionality, but also to yield features that are
uncorrelated. This paper focuses on this dual objective of PCA, namely,
dimensionality reduction and decorrelation of features, which requires
estimating the eigenvectors of a data covariance matrix, as opposed to only
estimating the subspace spanned by the eigenvectors. The ever-increasing volume
of data in the modern world often requires storage of data samples across
multiple machines, which precludes the use of centralized PCA algorithms.
Although a few distributed solutions to the PCA problem have been proposed
recently, convergence guarantees and/or communications overhead of these
solutions remain a concern. With an eye towards communications efficiency, this
paper introduces a feedforward neural network-based one time-scale distributed
PCA algorithm termed Distributed Sanger's Algorithm (DSA) that estimates the
eigenvectors of a data covariance matrix when data are distributed across an
undirected and arbitrarily connected network of machines. Furthermore, the
proposed algorithm is shown to converge linearly to a neighborhood of the true
solution. Numerical results are also shown to demonstrate the efficacy of the
proposed solution.
Related papers
- Learning-Augmented K-Means Clustering Using Dimensional Reduction [1.7243216387069678]
We propose a solution to reduce the dimensionality of the dataset using Principal Component Analysis (PCA)
PCA is well-established in the literature and has become one of the most useful tools for data modeling, compression, and visualization.
arXiv Detail & Related papers (2024-01-06T12:02:33Z) - Improved Privacy-Preserving PCA Using Optimized Homomorphic Matrix
Multiplication [0.0]
Principal Component Analysis (PCA) is a pivotal technique widely utilized in the realms of machine learning and data analysis.
In recent years, there have been endeavors to utilize homomorphic encryption in privacy-preserving PCA algorithms for the secure cloud computing scenario.
We propose a novel approach to privacy-preserving PCA that addresses these limitations, resulting in superior efficiency, accuracy, and scalability compared to previous approaches.
arXiv Detail & Related papers (2023-05-27T02:51:20Z) - Domain Adaptation Principal Component Analysis: base linear method for
learning with out-of-distribution data [55.41644538483948]
Domain adaptation is a popular paradigm in modern machine learning.
We present a method called Domain Adaptation Principal Component Analysis (DAPCA)
DAPCA finds a linear reduced data representation useful for solving the domain adaptation task.
arXiv Detail & Related papers (2022-08-28T21:10:56Z) - coVariance Neural Networks [119.45320143101381]
Graph neural networks (GNN) are an effective framework that exploit inter-relationships within graph-structured data for learning.
We propose a GNN architecture, called coVariance neural network (VNN), that operates on sample covariance matrices as graphs.
We show that VNN performance is indeed more stable than PCA-based statistical approaches.
arXiv Detail & Related papers (2022-05-31T15:04:43Z) - Unfolding Projection-free SDP Relaxation of Binary Graph Classifier via
GDPA Linearization [59.87663954467815]
Algorithm unfolding creates an interpretable and parsimonious neural network architecture by implementing each iteration of a model-based algorithm as a neural layer.
In this paper, leveraging a recent linear algebraic theorem called Gershgorin disc perfect alignment (GDPA), we unroll a projection-free algorithm for semi-definite programming relaxation (SDR) of a binary graph.
Experimental results show that our unrolled network outperformed pure model-based graph classifiers, and achieved comparable performance to pure data-driven networks but using far fewer parameters.
arXiv Detail & Related papers (2021-09-10T07:01:15Z) - FAST-PCA: A Fast and Exact Algorithm for Distributed Principal Component
Analysis [12.91948651812873]
Principal Component Analysis (PCA) is a fundamental data preprocessing tool in the world of machine learning.
This paper proposes a distributed PCA algorithm called FAST-PCA (Fast and exAct diSTributed PCA)
arXiv Detail & Related papers (2021-08-27T16:10:59Z) - Analysis of Truncated Orthogonal Iteration for Sparse Eigenvector
Problems [78.95866278697777]
We propose two variants of the Truncated Orthogonal Iteration to compute multiple leading eigenvectors with sparsity constraints simultaneously.
We then apply our algorithms to solve the sparse principle component analysis problem for a wide range of test datasets.
arXiv Detail & Related papers (2021-03-24T23:11:32Z) - Distributed Principal Subspace Analysis for Partitioned Big Data:
Algorithms, Analysis, and Implementation [9.730443503568804]
Principal Subspace Analysis (PSA) is one of the most popular approaches for dimensionality reduction in signal processing and machine learning.
centralized PSA solutions are fast becoming irrelevant in the modern era of big data.
This paper revisits the problem of distributed PSA under the general framework of an arbitrarily connected network of machines.
arXiv Detail & Related papers (2021-03-11T01:33:38Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - General stochastic separation theorems with optimal bounds [68.8204255655161]
Phenomenon of separability was revealed and used in machine learning to correct errors of Artificial Intelligence (AI) systems and analyze AI instabilities.
Errors or clusters of errors can be separated from the rest of the data.
The ability to correct an AI system also opens up the possibility of an attack on it, and the high dimensionality induces vulnerabilities caused by the same separability.
arXiv Detail & Related papers (2020-10-11T13:12:41Z) - Communication-efficient distributed eigenspace estimation [31.69089186688224]
We develop a communication-efficient distributed algorithm for computing the leading invariant subspace of a data matrix.
Our algorithm uses a novel alignment scheme that minimizes the Procrustean distance between local solutions and a reference solution.
We show that our algorithm achieves a similar error rate to that of a centralized estimator.
arXiv Detail & Related papers (2020-09-05T02:11:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.