Privacy-Preserving Feature Selection with Secure Multiparty Computation
- URL: http://arxiv.org/abs/2102.03517v1
- Date: Sat, 6 Feb 2021 05:33:04 GMT
- Title: Privacy-Preserving Feature Selection with Secure Multiparty Computation
- Authors: Xiling Li and Rafael Dowsley and Martine De Cock
- Abstract summary: We propose the first MPC based protocol for private feature selection based on the filter method.
We show that secure feature selection with the proposed protocols improves the accuracy of classifiers on a variety of real-world data sets.
- Score: 9.478262337000066
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing work on privacy-preserving machine learning with Secure Multiparty
Computation (MPC) is almost exclusively focused on model training and on
inference with trained models, thereby overlooking the important data
pre-processing stage. In this work, we propose the first MPC based protocol for
private feature selection based on the filter method, which is independent of
model training, and can be used in combination with any MPC protocol to rank
features. We propose an efficient feature scoring protocol based on Gini
impurity to this end. To demonstrate the feasibility of our approach for
practical data science, we perform experiments with the proposed MPC protocols
for feature selection in a commonly used machine-learning-as-a-service
configuration where computations are outsourced to multiple servers, with
semi-honest and with malicious adversaries. Regarding effectiveness, we show
that secure feature selection with the proposed protocols improves the accuracy
of classifiers on a variety of real-world data sets, without leaking
information about the feature values or even which features were selected.
Regarding efficiency, we document runtimes ranging from several seconds to an
hour for our protocols to finish, depending on the size of the data set and the
security settings.
Related papers
- Hardware Aware Ensemble Selection for Balancing Predictive Accuracy and Cost [0.6486052012623046]
We introduce a hardware-aware ensemble selection approach that integrates inference time into post hoc ensembling.
By leveraging an existing framework for ensemble selection with quality diversity optimization, our method evaluates ensemble candidates for their predictive accuracy and hardware efficiency.
Our evaluation using 83 classification datasets shows that our approach sustains competitive accuracy and can significantly improve ensembles' operational efficiency.
arXiv Detail & Related papers (2024-08-05T07:30:18Z) - LLM-Select: Feature Selection with Large Language Models [64.5099482021597]
Large language models (LLMs) are capable of selecting the most predictive features, with performance rivaling the standard tools of data science.
Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place.
arXiv Detail & Related papers (2024-07-02T22:23:40Z) - Provable Mutual Benefits from Federated Learning in Privacy-Sensitive Domains [3.3748750222488657]
Cross-silo federated learning allows data owners to train accurate machine learning models by benefiting from each others private datasets.
To incentivize client participation in privacy-sensitive domains, a FL protocol should strike a delicate balance between privacy guarantees and end-model accuracy.
We study the question of when and how a server could design a FL protocol provably beneficial for all participants.
arXiv Detail & Related papers (2024-03-11T12:43:44Z) - Feature Selection via Maximizing Distances between Class Conditional
Distributions [9.596923373834093]
We propose a novel feature selection framework based on the distance between class conditional distributions, measured by integral probability metrics (IPMs)
Our framework directly explores the discriminative information of features in the sense of distributions for supervised classification.
Experimental results show that our framework can outperform state-of-the-art methods in terms of classification accuracy and robustness to perturbations.
arXiv Detail & Related papers (2024-01-15T06:10:10Z) - Privacy Side Channels in Machine Learning Systems [87.53240071195168]
We introduce privacy side channels: attacks that exploit system-level components to extract private information.
For example, we show that deduplicating training data before applying differentially-private training creates a side-channel that completely invalidates any provable privacy guarantees.
We further show that systems which block language models from regenerating training data can be exploited to exfiltrate private keys contained in the training set.
arXiv Detail & Related papers (2023-09-11T16:49:05Z) - Tuning Pre-trained Model via Moment Probing [62.445281364055795]
We propose a novel Moment Probing (MP) method to explore the potential of LP.
MP performs a linear classification head based on the mean of final features.
Our MP significantly outperforms LP and is competitive with counterparts at less training cost.
arXiv Detail & Related papers (2023-07-21T04:15:02Z) - Byzantine-Robust Federated Learning with Optimal Statistical Rates and
Privacy Guarantees [123.0401978870009]
We propose Byzantine-robust federated learning protocols with nearly optimal statistical rates.
We benchmark against competing protocols and show the empirical superiority of the proposed protocols.
Our protocols with bucketing can be naturally combined with privacy-guaranteeing procedures to introduce security against a semi-honest server.
arXiv Detail & Related papers (2022-05-24T04:03:07Z) - Training Differentially Private Models with Secure Multiparty
Computation [12.628792164922864]
We address the problem of learning a machine learning model from data that originates at multiple data owners.
Existing solutions based on Differential Privacy (DP) achieve this at the cost of a drop in accuracy.
Our solution relies on an MPC protocol for model training, and an MPC protocol for perturbing the trained model coefficients with Laplace noise.
arXiv Detail & Related papers (2022-02-05T20:00:37Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - PRICURE: Privacy-Preserving Collaborative Inference in a Multi-Party
Setting [3.822543555265593]
This paper presents PRICURE, a system that combines complementary strengths of secure multi-party computation and differential privacy.
PRICURE enables privacy-preserving collaborative prediction among multiple model owners.
We evaluate PRICURE on neural networks across four datasets including benchmark medical image classification datasets.
arXiv Detail & Related papers (2021-02-19T05:55:53Z) - User-Level Privacy-Preserving Federated Learning: Analysis and
Performance Optimization [77.43075255745389]
Federated learning (FL) is capable of preserving private data from mobile terminals (MTs) while training the data into useful models.
From a viewpoint of information theory, it is still possible for a curious server to infer private information from the shared models uploaded by MTs.
We propose a user-level differential privacy (UDP) algorithm by adding artificial noise to the shared models before uploading them to servers.
arXiv Detail & Related papers (2020-02-29T10:13:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.