Corella: A Private Multi Server Learning Approach based on Correlated
Queries
- URL: http://arxiv.org/abs/2003.12052v2
- Date: Mon, 27 Jul 2020 09:39:00 GMT
- Title: Corella: A Private Multi Server Learning Approach based on Correlated
Queries
- Authors: Hamidreza Ehteram, Mohammad Ali Maddah-Ali, Mahtab Mirmohseni
- Abstract summary: We propose $textitCorella$ as an alternative approach to protect the privacy of data.
The proposed scheme relies on a cluster of servers, where at most $T in mathbbN$ of them may collude, each running a learning model.
The variance of the noise is set to be large enough to make the information leakage to any subset of up to $T$ servers information-theoretically negligible.
- Score: 30.3330177204504
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The emerging applications of machine learning algorithms on mobile devices
motivate us to offload the computation tasks of training a model or deploying a
trained one to the cloud or at the edge of the network. One of the major
challenges in this setup is to guarantee the privacy of the client data.
Various methods have been proposed to protect privacy in the literature. Those
include (i) adding noise to the client data, which reduces the accuracy of the
result, (ii) using secure multiparty computation (MPC), which requires
significant communication among the computing nodes or with the client, (iii)
relying on homomorphic encryption (HE) methods, which significantly increases
computation load at the servers. In this paper, we propose $\textit{Corella}$
as an alternative approach to protect the privacy of data. The proposed scheme
relies on a cluster of servers, where at most $T \in \mathbb{N}$ of them may
collude, each running a learning model (e.g., a deep neural network). Each
server is fed with the client data, added with $\textit{strong}$ noise,
independent from user data. The variance of the noise is set to be large enough
to make the information leakage to any subset of up to $T$ servers
information-theoretically negligible. On the other hand, the added noises for
different servers are $\textit{correlated}$. This correlation among the queries
allows the parameters of the models running on different servers to be
$\textit{trained}$ such that the client can mitigate the contribution of the
noises by combining the outputs of the servers, and recover the final result
with high accuracy and with a minor computational effort. Simulation results
for various datasets demonstrate the accuracy of the proposed approach for the
classification, using deep neural networks, and the autoencoder, as supervised
and unsupervised learning tasks, respectively.
Related papers
- Robust Model Evaluation over Large-scale Federated Networks [8.700087812420687]
We address the challenge of certifying the performance of a machine learning model on an unseen target network.
We derive theoretical guarantees for the model's empirical average loss and provide uniform bounds on the risk CDF.
Our bounds are computable in time with a number of queries to the $K$ clients, preserving client privacy by querying only the model's loss on private data.
arXiv Detail & Related papers (2024-10-26T18:45:15Z) - A Novel Neural Network-Based Federated Learning System for Imbalanced
and Non-IID Data [2.9642661320713555]
Most machine learning algorithms rely heavily on large amount of data which may be collected from various sources.
To combat this issue, researchers have introduced federated learning, where a prediction model is learnt by ensuring the privacy of data of clients data.
In this research, we propose a centralized, neural network-based federated learning system.
arXiv Detail & Related papers (2023-11-16T17:14:07Z) - Scalable Collaborative Learning via Representation Sharing [53.047460465980144]
Federated learning (FL) and Split Learning (SL) are two frameworks that enable collaborative learning while keeping the data private (on device)
In FL, each data holder trains a model locally and releases it to a central server for aggregation.
In SL, the clients must release individual cut-layer activations (smashed data) to the server and wait for its response (during both inference and back propagation).
In this work, we present a novel approach for privacy-preserving machine learning, where the clients collaborate via online knowledge distillation using a contrastive loss.
arXiv Detail & Related papers (2022-11-20T10:49:22Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Scalable Neural Data Server: A Data Recommender for Transfer Learning [70.06289658553675]
Transfer learning is a popular strategy for leveraging additional data to improve the downstream performance.
Nerve Data Server (NDS), a search engine that recommends relevant data for a given downstream task, has been previously proposed to address this problem.
NDS uses a mixture of experts trained on data sources to estimate similarity between each source and the downstream task.
SNDS represents both data sources and downstream tasks by their proximity to the intermediary datasets.
arXiv Detail & Related papers (2022-06-19T12:07:32Z) - HideNseek: Federated Lottery Ticket via Server-side Pruning and Sign
Supermask [11.067931264340931]
Federated learning alleviates the privacy risk in distributed learning by transmitting only the local model updates to the central server.
Previous works have tackled these challenges by combining personalization with model compression schemes including quantization and pruning.
We propose HideNseek which employs one-shot data-agnostic pruning to get a subnetwork based on weights' synaptic saliency.
arXiv Detail & Related papers (2022-06-09T09:55:31Z) - THE-X: Privacy-Preserving Transformer Inference with Homomorphic
Encryption [112.02441503951297]
Privacy-preserving inference of transformer models is on the demand of cloud service users.
We introduce $textitTHE-X$, an approximation approach for transformers, which enables privacy-preserving inference of pre-trained models.
arXiv Detail & Related papers (2022-06-01T03:49:18Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - Differentially Private Secure Multi-Party Computation for Federated
Learning in Financial Applications [5.50791468454604]
Federated learning enables a population of clients, working with a trusted server, to collaboratively learn a shared machine learning model.
This reduces the risk of exposing sensitive data, but it is still possible to reverse engineer information about a client's private data set from communicated model parameters.
We present a privacy-preserving federated learning protocol to a non-specialist audience, demonstrate it using logistic regression on a real-world credit card fraud data set, and evaluate it using an open-source simulation platform.
arXiv Detail & Related papers (2020-10-12T17:16:27Z) - CryptoSPN: Privacy-preserving Sum-Product Network Inference [84.88362774693914]
We present a framework for privacy-preserving inference of sum-product networks (SPNs)
CryptoSPN achieves highly efficient and accurate inference in the order of seconds for medium-sized SPNs.
arXiv Detail & Related papers (2020-02-03T14:49:18Z) - Secure Summation via Subset Sums: A New Primitive for Privacy-Preserving
Distributed Machine Learning [15.275126264550943]
Summation is an important primitive for computing means, counts or mini-batch gradients.
In many cases, the data is privacy-sensitive and cannot be collected on a central server.
Existing solutions for distributed summation with computational privacy guarantees make trust or connection assumptions that might not be fulfilled in real world settings.
We propose Secure Summation via Subset Sums (S5), a method for distributed summation that works in the presence of a malicious server and only two honest clients.
arXiv Detail & Related papers (2019-06-27T23:27:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.