Personalized PCA: Decoupling Shared and Unique Features
- URL: http://arxiv.org/abs/2207.08041v2
- Date: Thu, 8 Feb 2024 17:35:56 GMT
- Title: Personalized PCA: Decoupling Shared and Unique Features
- Authors: Naichen Shi and Raed Al Kontar
- Abstract summary: We propose personalized PCA (PerPCA) to decouple shared and unique features from heterogeneous datasets.
We show that, under mild conditions, both unique and shared features can be identified and recovered by a constrained optimization problem.
As a systematic approach to decouple shared and unique features from heterogeneous datasets, PerPCA finds applications in several tasks, including video segmentation, topic extraction, and feature clustering.
- Score: 4.976703689624386
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we tackle a significant challenge in PCA: heterogeneity. When
data are collected from different sources with heterogeneous trends while still
sharing some congruency, it is critical to extract shared knowledge while
retaining the unique features of each source. To this end, we propose
personalized PCA (PerPCA), which uses mutually orthogonal global and local
principal components to encode both unique and shared features. We show that,
under mild conditions, both unique and shared features can be identified and
recovered by a constrained optimization problem, even if the covariance
matrices are immensely different. Also, we design a fully federated algorithm
inspired by distributed Stiefel gradient descent to solve the problem. The
algorithm introduces a new group of operations called generalized retractions
to handle orthogonality constraints, and only requires global PCs to be shared
across sources. We prove the linear convergence of the algorithm under suitable
assumptions. Comprehensive numerical experiments highlight PerPCA's superior
performance in feature extraction and prediction from heterogeneous datasets.
As a systematic approach to decouple shared and unique features from
heterogeneous datasets, PerPCA finds applications in several tasks, including
video segmentation, topic extraction, and feature clustering.
Related papers
- Sparse outlier-robust PCA for multi-source data [2.3226893628361687]
We introduce a novel PCA methodology that simultaneously selects important features as well as local source-specific patterns.
We develop a regularization problem with a penalty that accommodates global-local structured sparsity patterns.
We provide an efficient implementation of our proposal via the Alternating Direction Method of Multiplier.
arXiv Detail & Related papers (2024-07-23T08:55:03Z) - Gram-Schmidt Methods for Unsupervised Feature Extraction and Selection [7.373617024876725]
We propose a Gram-Schmidt process over function spaces to detect and map out nonlinear dependencies.
We provide experimental results for synthetic and real-world benchmark datasets.
Surprisingly, our linear feature extraction algorithms are comparable and often outperform several important nonlinear feature extraction methods.
arXiv Detail & Related papers (2023-11-15T21:29:57Z) - Differentially Private Federated Clustering over Non-IID Data [59.611244450530315]
clustering clusters (FedC) problem aims to accurately partition unlabeled data samples distributed over massive clients into finite clients under the orchestration of a server.
We propose a novel FedC algorithm using differential privacy convergence technique, referred to as DP-Fed, in which partial participation and multiple clients are also considered.
Various attributes of the proposed DP-Fed are obtained through theoretical analyses of privacy protection, especially for the case of non-identically and independently distributed (non-i.i.d.) data.
arXiv Detail & Related papers (2023-01-03T05:38:43Z) - Sparse PCA With Multiple Components [2.5382095320488673]
Sparse Principal Component Analysis (sPCA) is a technique for obtaining combinations of features that explain variance of high-dimensional datasets in an interpretable manner.
Most existing PCA methods do not guarantee the optimality, let alone the optimality, of the resulting solution when we seek multiple sparse PCs.
We propose exact methods and rounding mechanisms that, together, obtain solutions with a bound gap on the order of 0%-15% for real-world datasets.
arXiv Detail & Related papers (2022-09-29T13:57:18Z) - Feature Selection via the Intervened Interpolative Decomposition and its
Application in Diversifying Quantitative Strategies [4.913248451323163]
We propose a probabilistic model for computing an interpolative decomposition (ID) in which each column of the observed matrix has its own priority or importance.
We evaluate the proposed models on real-world datasets, including ten Chinese A-share stocks.
arXiv Detail & Related papers (2022-09-29T03:36:56Z) - Federated Offline Reinforcement Learning [55.326673977320574]
We propose a multi-site Markov decision process model that allows for both homogeneous and heterogeneous effects across sites.
We design the first federated policy optimization algorithm for offline RL with sample complexity.
We give a theoretical guarantee for the proposed algorithm, where the suboptimality for the learned policies is comparable to the rate as if data is not distributed.
arXiv Detail & Related papers (2022-06-11T18:03:26Z) - A Linearly Convergent Algorithm for Distributed Principal Component
Analysis [12.91948651812873]
This paper introduces a feedforward neural network-based one time-scale distributed PCA algorithm termed Distributed Sanger's Algorithm (DSA)
The proposed algorithm is shown to converge linearly to a neighborhood of the true solution.
arXiv Detail & Related papers (2021-01-05T00:51:14Z) - Sparse PCA via $l_{2,p}$-Norm Regularization for Unsupervised Feature
Selection [138.97647716793333]
We propose a simple and efficient unsupervised feature selection method, by combining reconstruction error with $l_2,p$-norm regularization.
We present an efficient optimization algorithm to solve the proposed unsupervised model, and analyse the convergence and computational complexity of the algorithm theoretically.
arXiv Detail & Related papers (2020-12-29T04:08:38Z) - Out-of-distribution Generalization via Partial Feature Decorrelation [72.96261704851683]
We present a novel Partial Feature Decorrelation Learning (PFDL) algorithm, which jointly optimize a feature decomposition network and the target image classification model.
The experiments on real-world datasets demonstrate that our method can improve the backbone model's accuracy on OOD image classification datasets.
arXiv Detail & Related papers (2020-07-30T05:48:48Z) - Model Fusion with Kullback--Leibler Divergence [58.20269014662046]
We propose a method to fuse posterior distributions learned from heterogeneous datasets.
Our algorithm relies on a mean field assumption for both the fused model and the individual dataset posteriors.
arXiv Detail & Related papers (2020-07-13T03:27:45Z) - Repulsive Mixture Models of Exponential Family PCA for Clustering [127.90219303669006]
The mixture extension of exponential family principal component analysis ( EPCA) was designed to encode much more structural information about data distribution than the traditional EPCA.
The traditional mixture of local EPCAs has the problem of model redundancy, i.e., overlaps among mixing components, which may cause ambiguity for data clustering.
In this paper, a repulsiveness-encouraging prior is introduced among mixing components and a diversified EPCA mixture (DEPCAM) model is developed in the Bayesian framework.
arXiv Detail & Related papers (2020-04-07T04:07:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.