Streaming Kernel PCA Algorithm With Small Space
- URL: http://arxiv.org/abs/2303.04555v1
- Date: Wed, 8 Mar 2023 13:13:33 GMT
- Title: Streaming Kernel PCA Algorithm With Small Space
- Authors: Yichuan Deng, Zhao Song, Zifan Wang, Han Zhang
- Abstract summary: Streaming PCA has gained significant attention in recent years, as it can handle large datasets efficiently.
We propose a streaming algorithm for Kernel problems based on the traditional scheme by Oja.
Our algorithm addresses the challenge of reducing the memory usage of PCA while maintaining its accuracy.
- Score: 24.003544967343615
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Principal Component Analysis (PCA) is a widely used technique in machine
learning, data analysis and signal processing. With the increase in the size
and complexity of datasets, it has become important to develop low-space usage
algorithms for PCA. Streaming PCA has gained significant attention in recent
years, as it can handle large datasets efficiently. The kernel method, which is
commonly used in learning algorithms such as Support Vector Machines (SVMs),
has also been applied in PCA algorithms.
We propose a streaming algorithm for Kernel PCA problems based on the
traditional scheme by Oja. Our algorithm addresses the challenge of reducing
the memory usage of PCA while maintaining its accuracy. We analyze the
performance of our algorithm by studying the conditions under which it
succeeds. Specifically, we show that, when the spectral ratio $R :=
\lambda_1/\lambda_2$ of the target covariance matrix is lower bounded by $C
\cdot \log n\cdot \log d$, the streaming PCA can be solved with $O(d)$ space
cost.
Our proposed algorithm has several advantages over existing methods. First,
it is a streaming algorithm that can handle large datasets efficiently. Second,
it employs the kernel method, which allows it to capture complex nonlinear
relationships among data points. Third, it has a low-space usage, making it
suitable for applications where memory is limited.
Related papers
- Efficient Sparse PCA via Block-Diagonalization [13.38174941551702]
We propose a novel framework to efficiently approximate Sparse PCA.
It can leverage any off-the-shelf Sparse PCA algorithm and achieve significant computational speedups.
Our framework, when integrated with this algorithm, reduces the runtime to $mathcalOleft(fracddstar cdot g(k, dstar) + d2right)$.
arXiv Detail & Related papers (2024-10-18T00:16:10Z) - Learning-Augmented K-Means Clustering Using Dimensional Reduction [1.7243216387069678]
We propose a solution to reduce the dimensionality of the dataset using Principal Component Analysis (PCA)
PCA is well-established in the literature and has become one of the most useful tools for data modeling, compression, and visualization.
arXiv Detail & Related papers (2024-01-06T12:02:33Z) - Improved Privacy-Preserving PCA Using Optimized Homomorphic Matrix
Multiplication [0.0]
Principal Component Analysis (PCA) is a pivotal technique widely utilized in the realms of machine learning and data analysis.
In recent years, there have been endeavors to utilize homomorphic encryption in privacy-preserving PCA algorithms for the secure cloud computing scenario.
We propose a novel approach to privacy-preserving PCA that addresses these limitations, resulting in superior efficiency, accuracy, and scalability compared to previous approaches.
arXiv Detail & Related papers (2023-05-27T02:51:20Z) - Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA [43.106438224356175]
We develop a nearly-linear time algorithm for robust PCA with near-optimal error guarantees.
We also develop a single-pass streaming algorithm for robust PCA with memory usage nearly-linear in the dimension.
arXiv Detail & Related papers (2023-05-04T04:45:16Z) - Linearized Wasserstein dimensionality reduction with approximation
guarantees [65.16758672591365]
LOT Wassmap is a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space.
We show that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size.
We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
arXiv Detail & Related papers (2023-02-14T22:12:16Z) - Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel.
Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU.
Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Large-scale Optimization of Partial AUC in a Range of False Positive
Rates [51.12047280149546]
The area under the ROC curve (AUC) is one of the most widely used performance measures for classification models in machine learning.
We develop an efficient approximated gradient descent method based on recent practical envelope smoothing technique.
Our proposed algorithm can also be used to minimize the sum of some ranked range loss, which also lacks efficient solvers.
arXiv Detail & Related papers (2022-03-03T03:46:18Z) - Breaking the Linear Iteration Cost Barrier for Some Well-known
Conditional Gradient Methods Using MaxIP Data-structures [49.73889315176884]
Conditional gradient methods (CGM) are widely used in modern machine learning.
Most efforts focus on reducing the number of iterations as a means to reduce the overall running time.
We show the first algorithm, where the cost per iteration is sublinear in the number of parameters, for many fundamental optimization algorithms.
arXiv Detail & Related papers (2021-11-30T05:40:14Z) - Approximation Algorithms for Sparse Principal Component Analysis [57.5357874512594]
Principal component analysis (PCA) is a widely used dimension reduction technique in machine learning and statistics.
Various approaches to obtain sparse principal direction loadings have been proposed, which are termed Sparse Principal Component Analysis.
We present thresholding as a provably accurate, time, approximation algorithm for the SPCA problem.
arXiv Detail & Related papers (2020-06-23T04:25:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.