Related papers: Privacy-Preserving Feature Selection with Secure Multiparty Computation

Privacy-Preserving Feature Selection with Secure Multiparty Computation

URL: http://arxiv.org/abs/2102.03517v1
Date: Sat, 6 Feb 2021 05:33:04 GMT
Title: Privacy-Preserving Feature Selection with Secure Multiparty Computation
Authors: Xiling Li and Rafael Dowsley and Martine De Cock
Abstract summary: We propose the first MPC based protocol for private feature selection based on the filter method. We show that secure feature selection with the proposed protocols improves the accuracy of classifiers on a variety of real-world data sets.
Score: 9.478262337000066
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing work on privacy-preserving machine learning with Secure Multiparty Computation (MPC) is almost exclusively focused on model training and on inference with trained models, thereby overlooking the important data pre-processing stage. In this work, we propose the first MPC based protocol for private feature selection based on the filter method, which is independent of model training, and can be used in combination with any MPC protocol to rank features. We propose an efficient feature scoring protocol based on Gini impurity to this end. To demonstrate the feasibility of our approach for practical data science, we perform experiments with the proposed MPC protocols for feature selection in a commonly used machine-learning-as-a-service configuration where computations are outsourced to multiple servers, with semi-honest and with malicious adversaries. Regarding effectiveness, we show that secure feature selection with the proposed protocols improves the accuracy of classifiers on a variety of real-world data sets, without leaking information about the feature values or even which features were selected. Regarding efficiency, we document runtimes ranging from several seconds to an hour for our protocols to finish, depending on the size of the data set and the security settings.

Related papers

When Focus Enhances Utility: Target Range LDP Frequency Estimation and Unknown Item Discovery [7.746385592375338]
Local Differential Privacy protocols have been successfully deployed in real-world scenarios by tech companies like Google, Apple, and Microsoft. We propose a Generalized Count Mean Sketch protocol that captures many existing frequency estimation protocols. We present a novel protocol for collecting data within unknown domain, as our frequency estimation protocols only work effectively with known data domain.
arXiv Detail & Related papers (2024-12-23T05:50:11Z)
Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines. We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z)
The Communication-Friendly Privacy-Preserving Machine Learning against Malicious Adversaries [14.232901861974819]
Privacy-preserving machine learning (PPML) is an innovative approach that allows for secure data analysis while safeguarding sensitive information. We introduce efficient protocol for secure linear function evaluation. We extend the protocol to handle linear and non-linear layers, ensuring compatibility with a wide range of machine-learning models.
arXiv Detail & Related papers (2024-11-14T08:55:14Z)
Hardware Aware Ensemble Selection for Balancing Predictive Accuracy and Cost [0.6486052012623046]
We introduce a hardware-aware ensemble selection approach that integrates inference time into post hoc ensembling. By leveraging an existing framework for ensemble selection with quality diversity optimization, our method evaluates ensemble candidates for their predictive accuracy and hardware efficiency. Our evaluation using 83 classification datasets shows that our approach sustains competitive accuracy and can significantly improve ensembles' operational efficiency.
arXiv Detail & Related papers (2024-08-05T07:30:18Z)
LLM-Select: Feature Selection with Large Language Models [64.5099482021597]
Large language models (LLMs) are capable of selecting the most predictive features, with performance rivaling the standard tools of data science. Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place.
arXiv Detail & Related papers (2024-07-02T22:23:40Z)
Provable Mutual Benefits from Federated Learning in Privacy-Sensitive Domains [3.3748750222488657]
Cross-silo federated learning allows data owners to train accurate machine learning models by benefiting from each others private datasets. To incentivize client participation in privacy-sensitive domains, a FL protocol should strike a delicate balance between privacy guarantees and end-model accuracy. We study the question of when and how a server could design a FL protocol provably beneficial for all participants.
arXiv Detail & Related papers (2024-03-11T12:43:44Z)
Feature Selection via Maximizing Distances between Class Conditional Distributions [9.596923373834093]
We propose a novel feature selection framework based on the distance between class conditional distributions, measured by integral probability metrics (IPMs) Our framework directly explores the discriminative information of features in the sense of distributions for supervised classification. Experimental results show that our framework can outperform state-of-the-art methods in terms of classification accuracy and robustness to perturbations.
arXiv Detail & Related papers (2024-01-15T06:10:10Z)
Privacy Side Channels in Machine Learning Systems [87.53240071195168]
We introduce privacy side channels: attacks that exploit system-level components to extract private information. For example, we show that deduplicating training data before applying differentially-private training creates a side-channel that completely invalidates any provable privacy guarantees. We further show that systems which block language models from regenerating training data can be exploited to exfiltrate private keys contained in the training set.
arXiv Detail & Related papers (2023-09-11T16:49:05Z)
Tuning Pre-trained Model via Moment Probing [62.445281364055795]
We propose a novel Moment Probing (MP) method to explore the potential of LP. MP performs a linear classification head based on the mean of final features. Our MP significantly outperforms LP and is competitive with counterparts at less training cost.
arXiv Detail & Related papers (2023-07-21T04:15:02Z)
Byzantine-Robust Federated Learning with Optimal Statistical Rates and Privacy Guarantees [123.0401978870009]
We propose Byzantine-robust federated learning protocols with nearly optimal statistical rates. We benchmark against competing protocols and show the empirical superiority of the proposed protocols. Our protocols with bucketing can be naturally combined with privacy-guaranteeing procedures to introduce security against a semi-honest server.
arXiv Detail & Related papers (2022-05-24T04:03:07Z)
Training Differentially Private Models with Secure Multiparty Computation [12.628792164922864]
We address the problem of learning a machine learning model from data that originates at multiple data owners. Existing solutions based on Differential Privacy (DP) achieve this at the cost of a drop in accuracy. Our solution relies on an MPC protocol for model training, and an MPC protocol for perturbing the trained model coefficients with Laplace noise.
arXiv Detail & Related papers (2022-02-05T20:00:37Z)
Compactness Score: A Fast Filter Method for Unsupervised Feature Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features. Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z)
User-Level Privacy-Preserving Federated Learning: Analysis and Performance Optimization [77.43075255745389]
Federated learning (FL) is capable of preserving private data from mobile terminals (MTs) while training the data into useful models. From a viewpoint of information theory, it is still possible for a curious server to infer private information from the shared models uploaded by MTs. We propose a user-level differential privacy (UDP) algorithm by adding artificial noise to the shared models before uploading them to servers.
arXiv Detail & Related papers (2020-02-29T10:13:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.