Privacy Preserving PCA for Multiparty Modeling
- URL: http://arxiv.org/abs/2002.02091v3
- Date: Thu, 12 Mar 2020 12:38:39 GMT
- Title: Privacy Preserving PCA for Multiparty Modeling
- Authors: Yingting Liu, Chaochao Chen, Longfei Zheng, Li Wang, Jun Zhou, Guiquan
Liu, Shuang Yang
- Abstract summary: PPPCA can accomplish multiparty cooperative execution of PCA under the premise of keeping plaintext data locally.
The output of PPPCA can be sent directly to data consumer to build any machine learning models.
Results show that the accuracy of the model built upon PPPCA is the same as the model with PCA that is built based on centralized data.
- Score: 21.33430578478244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a general multiparty modeling paradigm with Privacy
Preserving Principal Component Analysis (PPPCA) for horizontally partitioned
data. PPPCA can accomplish multiparty cooperative execution of PCA under the
premise of keeping plaintext data locally. We also propose implementations
using two techniques, i.e., homomorphic encryption and secret sharing. The
output of PPPCA can be sent directly to data consumer to build any machine
learning models. We conduct experiments on three UCI benchmark datasets and a
real-world fraud detection dataset. Results show that the accuracy of the model
built upon PPPCA is the same as the model with PCA that is built based on
centralized plaintext data.
Related papers
- Incentives in Private Collaborative Machine Learning [56.84263918489519]
Collaborative machine learning involves training models on data from multiple parties.
We introduce differential privacy (DP) as an incentive.
We empirically demonstrate the effectiveness and practicality of our approach on synthetic and real-world datasets.
arXiv Detail & Related papers (2024-04-02T06:28:22Z) - Secure and Effective Data Appraisal for Machine Learning [17.828547661524688]
This paper introduces an innovative approach that renders data selection practical.
The proposed method is assessed across an array of Transformer models and NLP/CV benchmarks.
In comparison to the direct MPC-based evaluation of the target model, our approach substantially reduces the time required, from thousands of hours to mere tens of hours, with only a nominal 0.20% dip in accuracy when training with the selected data.
arXiv Detail & Related papers (2023-10-03T18:52:57Z) - Efficient fair PCA for fair representation learning [21.990310743597174]
We propose a conceptually simple approach that allows for an analytic solution similar to standard PCA and can be kernelized.
Our methods have the same complexity as standard PCA, or kernel PCA, and run much faster than existing methods for fair PCA based on semidefinite programming or manifold optimization.
arXiv Detail & Related papers (2023-02-26T13:34:43Z) - Differentially Private Federated Clustering over Non-IID Data [59.611244450530315]
clustering clusters (FedC) problem aims to accurately partition unlabeled data samples distributed over massive clients into finite clients under the orchestration of a server.
We propose a novel FedC algorithm using differential privacy convergence technique, referred to as DP-Fed, in which partial participation and multiple clients are also considered.
Various attributes of the proposed DP-Fed are obtained through theoretical analyses of privacy protection, especially for the case of non-identically and independently distributed (non-i.i.d.) data.
arXiv Detail & Related papers (2023-01-03T05:38:43Z) - FAST-PCA: A Fast and Exact Algorithm for Distributed Principal Component
Analysis [12.91948651812873]
Principal Component Analysis (PCA) is a fundamental data preprocessing tool in the world of machine learning.
This paper proposes a distributed PCA algorithm called FAST-PCA (Fast and exAct diSTributed PCA)
arXiv Detail & Related papers (2021-08-27T16:10:59Z) - Data-Free Evaluation of User Contributions in Federated Learning [31.181141140071592]
Federated learning (FL) trains a machine learning model on mobile devices in a distributed manner using each device's private data and computing resources.
We propose a method called Pairwise Correlated Agreement (PCA) based on the idea of peer prediction to evaluate user contribution in FL without a test dataset.
We then apply PCA to designing (1) a new federated learning algorithm called Fed-PCA, and (2) a new incentive mechanism that guarantees truthfulness.
arXiv Detail & Related papers (2021-08-24T10:17:03Z) - Federated Generalized Face Presentation Attack Detection [112.27662334648302]
We propose a Federated Face Presentation Attack Detection (FedPAD) framework.
FedPAD takes advantage of rich fPAD information available at different data owners while preserving data privacy.
A server learns a global fPAD model by only aggregating domain-invariant parts of the fPAD models from data centers.
arXiv Detail & Related papers (2021-04-14T02:44:53Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Probabilistic Contrastive Principal Component Analysis [0.5286651840245514]
We propose a model-based alternative to contrastive principal component analysis ( CPCA)
We show PCPCA's advantages over CPCA, including greater interpretability, uncertainty quantification and principled inference.
We demonstrate PCPCA's performance through a series of simulations and case-control experiments with datasets of gene expression, protein expression, and images.
arXiv Detail & Related papers (2020-12-14T22:21:50Z) - Repulsive Mixture Models of Exponential Family PCA for Clustering [127.90219303669006]
The mixture extension of exponential family principal component analysis ( EPCA) was designed to encode much more structural information about data distribution than the traditional EPCA.
The traditional mixture of local EPCAs has the problem of model redundancy, i.e., overlaps among mixing components, which may cause ambiguity for data clustering.
In this paper, a repulsiveness-encouraging prior is introduced among mixing components and a diversified EPCA mixture (DEPCAM) model is developed in the Bayesian framework.
arXiv Detail & Related papers (2020-04-07T04:07:29Z) - User-Level Privacy-Preserving Federated Learning: Analysis and
Performance Optimization [77.43075255745389]
Federated learning (FL) is capable of preserving private data from mobile terminals (MTs) while training the data into useful models.
From a viewpoint of information theory, it is still possible for a curious server to infer private information from the shared models uploaded by MTs.
We propose a user-level differential privacy (UDP) algorithm by adding artificial noise to the shared models before uploading them to servers.
arXiv Detail & Related papers (2020-02-29T10:13:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.